ncats / lychi

Layered Chemical Identifier
Apache License 2.0
14 stars 10 forks source link

Isotope Perception: Nonspecific site #9

Open tylerperyea opened 10 years ago

tylerperyea commented 10 years ago

Certain compounds are isotopically enriched with a specific isotope, but at non-specific atoms. It would be helpful for these to be understood in their most commonly drawn format.

Example 1

Iothalamic acid I-131 CasRN: 770645-97-1 ChemID link: http://chem.sis.nlm.nih.gov/chemidplus/rn/770645-97-1

This structure is how it's shown in ChemID, and represents a typical strategy for capturing isotope enrichment:

Enriched:

radio HASH:

<NONE>

Non-enriched:

nonisotope

HASH:

D1DBNGVNG-G9T7D2UU8L-GLXN2UBNFSF-GLF3JS6KB6T7

The first fails in standardization, as it contains query / psuedo atoms. However, it is common enough that it should probably be handled. It would ideally produce the same hash as if it were missing the Isotopic enrichment, with a distinct 4th level hash meaning "mixed isotopic enrichment".

Proposed HASH:

D1DBNGVNG-G9T7D2UU8L-GLXN2UBNFSF-<DISTINCT>

Enriched Molfile:

4YR0DGU31K
  Symyx   12171317512D 1   1.00000     0.00000     0

 22 21  0     0  0            999 V2000
    9.7243   -3.7903    0.0000 *   0  0  0  0  0  0           0  0  0
   10.1951   -3.7903    0.0000 I   4  0  0  0  0  0           0  0  0
    4.3451   -6.0362    0.0000 C   0  0  0  0  0  0           0  0  0
    5.0076   -5.6653    0.0000 N   0  0  0  0  0  0           0  0  0
    5.0076   -4.9111    0.0000 C   0  0  0  0  0  0           0  0  0
    4.3451   -4.5111    0.0000 O   0  0  0  0  0  0           0  0  0
    5.6826   -4.5195    0.0000 C   0  0  0  0  0  0           0  0  0
    6.3409   -4.9112    0.0000 C   0  0  0  0  0  0           0  0  0
    6.9743   -4.5195    0.0000 C   0  0  0  0  0  0           0  0  0
    7.6493   -4.9112    0.0000 N   0  0  0  0  0  0           0  0  0
    8.3076   -4.5236    0.0000 C   0  0  0  0  0  0           0  0  0
    8.3076   -3.7903    0.0000 C   0  0  0  0  0  0           0  0  0
    8.9743   -4.9112    0.0000 O   0  0  0  0  0  0           0  0  0
    6.9743   -3.7695    0.0000 C   0  0  0  0  0  0           0  0  0
    7.6493   -3.3945    0.0000 I   0  0  0  0  0  0           0  0  0
    6.3409   -3.3861    0.0000 C   0  0  0  0  0  0           0  0  0
    6.3409   -2.6237    0.0000 C   0  0  0  0  0  0           0  0  0
    6.9951   -2.2528    0.0000 O   0  0  0  0  0  0           0  0  0
    5.6826   -2.2403    0.0000 O   0  0  0  0  0  0           0  0  0
    5.6826   -3.7695    0.0000 C   0  0  0  0  0  0           0  0  0
    5.0076   -3.3778    0.0000 I   0  0  0  0  0  0           0  0  0
    6.3409   -5.6153    0.0000 I   0  0  0  0  0  0           0  0  0
  1  2  1  0     0  0
  3  4  1  0     0  0
  4  5  1  0     0  0
  6  5  2  0     0  0
  5  7  1  0     0  0
  8  7  2  0     0  0
  9  8  1  0     0  0
 10  9  1  0     0  0
 11 10  1  0     0  0
 12 11  1  0     0  0
 13 11  2  0     0  0
  9 14  2  0     0  0
 15 14  1  0     0  0
 14 16  1  0     0  0
 17 16  1  0     0  0
 18 17  2  0     0  0
 19 17  1  0     0  0
 20 16  2  0     0  0
 21 20  1  0     0  0
  7 20  1  0     0  0
 22  8  1  0     0  0
M  ISO  1   2 131
M  END
caodac commented 10 years ago

So enrichment here is a boolean or quantitative value?

tylerperyea commented 10 years ago

It's just boolean. Extent of enrichment is well beyond what is usually captured / capturable with in a single molfile.

Really, the fragment method is just a hack to define a mixture of specific isotopic structures without drawing out each instance. It's somewhat analogous to undefined stereocenters implying a mixture of all wedge / dashes at those sites -- or having a markush structure to define variable substitution on a ring.

I suppose the deeper question here is, is lychi intended to handle mixtures of defined substances at all? And if so, which mixtures?