snurr-group / mofid

A system for rapid identification and analysis of metal-organic frameworks
https://snurr-group.github.io/mofid/
GNU General Public License v2.0
42 stars 24 forks source link

'Unknown' and 'NA' topologies on CoRE MOF dataset #26

Closed ThysBallard closed 4 years ago

ThysBallard commented 4 years ago

When running the CoRE MOF dataset through mofid, I get an issue with about a third of them being labeled with 'UNKNOWN' or 'NA' topology. Specifically:

I have tried to do some digging to resolve the issue, but I have yet to find a lead. For reference, here is a JSON file of the Python dictionaries generated for each MOF in the dataset. Any ideas what could be causing this?

Andrew-S-Rosen commented 4 years ago

@NegaNexus:

Thank you for reaching out. MOFid attempts to resolve the topology using Systre, which pulls from the list of topologies on the RCSR. A description of the messages you have received is in the Supporting Information of the original MOFid paper.

An instance of UNKNOWN indicates that a topology was successfully parsed, but no topology code has been assigned for it on the RCSR. Since there is no topology code, we currently have no option but to assign it an UNKNOWN label. Currently, MOFid is using the RCSR data as of June 1, 2019, which is the most recent public release, as noted here. If and when a new version of the RCSR database is made available for download (or if you would like to modify the file to introduce new topology codes), that can be done by modifying the RCSRnets.arc file and recompiling MOFid. If a topology is labeled as 'NA', there was some issue with parsing the topology, usually an instance that the topology is not well-defined (no MOF net detected). This can happen, for instance, if a material is not actually a MOF and has no underlying topology, but there are other possible reasons (e.g. if the sbu.cpp code crashes).

With the original MOFid paper, we have made all the MOFid/MOFkeys available for the CoRE MOF database (for all MOFs that were available at the time of writing). You can find them in the Supporting Information here. You should see that the distributions are similar to what you have independently reported.

Please let me know if you have any other questions.