kienerj / pycdxml

Tools to automatically convert and proccess cdx and cdxml files in python
GNU General Public License v3.0
38 stars 5 forks source link

Stereochemistry Label displayed twice after cdxml -> cdx conversion #13

Closed kienerj closed 1 year ago

kienerj commented 2 years ago

Enhanced stereo label gets shown twice after cdxml -> cdx conversion but is correct again after subsequent cdx->cdxml conversion (roundtrip).

Sample file: b-Cypermethrin (stereochemistry).cdxml

image

Investigation needed.

kienerj commented 1 year ago

Note that this issue also applies to normal stereo labels and not according to previous title only enhanced stereo.

Stereochemistry labels are displayed in cdxml via an objecttag object:

            <n id="50" p="181.74 85.44" Z="331" ShowAtomStereo="yes" Geometry="Tetrahedral" AS="N" BondOrdering="51 53 62 0" EnhancedStereoType="And" EnhancedStereoGroupNum="1">
                <objecttag id="5004" TagType="Unknown" Name="enhancedstereo">
                    <t id="5005" p="182.09 82.16" BoundingBox="182.09 75.8 191.19 82.26" CaptionLineHeight="variable">
                        <s font="3" size="7.5" face="0" color="0">&amp;1</s>
                    </t>
                </objecttag>
            </n>

it has been verified that this is correctly written to cdx. Eg exactly the same object is in cdx in which the symbol is shown twice when the file is opened in ChemDraw. When the file is then saved in ChemDraw as cdxml, above object tag is duplicated. In one case the name remains empty while in the original case it has a Name of Name="enhancedstereo".

d="44"
 p="253.65 99.28"
 Z="322"
 ShowAtomStereo="yes"
 Geometry="Tetrahedral"
 AS="N"
 BondOrdering="45 47 55 0"
 EnhancedStereoType="And"
 EnhancedStereoGroupNum="1"
><objecttag
 id="1"
 TagType="Unknown"
 Name=""
><t
 p="256.33 105.66"
 BoundingBox="256.33 99.31 265.50 105.76"
 CaptionLineHeight="variable"
><s font="3" size="7.5" color="0">&amp;1</s></t></objecttag><objecttag
 id="2"
 TagType="Unknown"
 Name="enhancedstereo"
><t
 p="249.09 94.49"
 BoundingBox="249.09 88.13 258.26 94.59"
 CaptionLineHeight="variable"
><s font="3" size="7.5" color="0">&amp;1</s></t></objecttag></n>

This duplication happens upon loading it into ChemDraw. The second object tag with empty name simply does not exist in the cdx file generated by PyCDXML.

Therefore it is unclear how to fix it. Maybe do not write objecttag to cdx at all?

EDIT:

Further investigation shows that the objecttag seems to be legacy as it can be omitted and display is still correct. display then likley done via attributes on the node. Converting such a file without the object tag leads to correct display in cdx.

Therefore the fix seems to be to remove this tag in case it is for stereochemistry.

kienerj commented 1 year ago

fixed by 11ad00d631aefea2ff13a92ee175afbee26e96ed