k-ujihara / NCDK

The Chemistry Development Kit ported to .NET
https://kazuyaujihara.github.io/NCDK/
GNU Lesser General Public License v2.1
35 stars 11 forks source link

Cml generated does not match CML Schema #22

Open MikeWilliams-UK opened 4 years ago

MikeWilliams-UK commented 4 years ago

The CML Schema requires that the file has a \<cml> root element. See Minimal Molecule example.

using (var file = new FileStream("./data/output.xml", FileMode.Create, FileAccess.Write))
{
    using (var writer = new CMLWriter(file))
    {
        writer.Write(layedOutMol);
    }
}

file ./data/output.xml is missing cml root element.

k-ujihara commented 4 years ago

So CML does not require cml is the root, Chemistry Development Kit (and NCDK) does not define xml namespace in molecule tag. In fact, PerkinElmer's ChemDraw 19.1 (newest verion) also goes the same way like the following.

CML generated by ChemDraw 19.1.

<?xml version="1.0"?>
<molecule xmlns="http://www.xml-cml.org/schema">
<atomArray>
<atom elementType="C" id="a2" x2="5.34607" y2="-4.77841"/>
<atom elementType="C" id="a4" x2="5.34607" y2="-6.30141"/>
<atom elementType="C" id="a6" x2="6.66503" y2="-7.06291"/>
<atom elementType="C" id="a8" x2="7.98398" y2="-6.30141"/>
<atom elementType="C" id="a10" x2="7.98398" y2="-4.77841"/>
<atom elementType="C" id="a12" x2="6.66503" y2="-4.01691"/>
</atomArray>
<bondArray>
<bond atomRefs2="a2 a4" id="b14" order="2"/>
<bond atomRefs2="a4 a6" id="b15" order="1"/>
<bond atomRefs2="a6 a8" id="b16" order="2"/>
<bond atomRefs2="a8 a10" id="b17" order="1"/>
<bond atomRefs2="a10 a12" id="b18" order="2"/>
<bond atomRefs2="a12 a2" id="b19" order="1"/>
</bondArray>
</molecule>
MikeWilliams-UK commented 4 years ago

@kazuyaujihara I have consulted with Peter Murray-Rust the originator of the cml standard and he has confirmed that if the document is stand alone then cml should be the root element. Like this.

<?xml version="1.0" encoding="UTF-8"?>
<cml>
    <molecule id="m1">
        <atomArray>
            <atom id="a1" elementType="H" />
        </atomArray>
    </molecule>
</cml>

A cml fragment can also be a child element of another document, therefore the following is also valid.

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <cml>
    <molecule id="m1">
        <atomArray>
            <atom id="a1" elementType="H" />
        </atomArray>
    </molecule>
  </cml>
</root>

Therefore neither NCDK, nor ChemDraw are compliant to the standard.

Please fix NCDK.

k-ujihara commented 4 years ago

I will consider about it. Anyway, there are several CML examples, which root element is not cml in http://www.xml-cml.org/examples/schema3/molecular/ like http://www.xml-cml.org/examples/schema3/molecular/minimal-molecule-3.html.

<?xml version="1.0" encoding="UTF-8"?>
<molecule xmlns="http://www.xml-cml.org/schema" xmlns:conventions="http://www.xml-cml.org/convention/"
          convention="conventions:molecular" id="m1">
</molecule>

A quick glance at http://www.xml-cml.org/schema/schema3/ does not seem to prevent a molecule tag to be a root.

k-ujihara commented 4 years ago

OpenBabel also uses a molecule tag as root.

MikeWilliams-UK commented 4 years ago

I think it's a bit unfair to say that the justification for not including the cml element as the parent of the element as only 6/38 do not have this, hence it may be a mistake in the data which underpins the web site http://www.xml-cml.org/examples/schema3/molecular/