Closed jessicalumian closed 2 years ago
Don't the compound names come from the CV table prep script that we provide?
I see similar values in the actual compound CV term table accumulated in the submission system. E.g. this link will serve you some raw CSV content: https://app.nih-cfde.org/ermrest/catalog/registry/attribute/CFDE:compound/name::regexp::WURCS/id,name?limit=20&accept=csv
The display is data-driven and uses these actual term records in the C2M2 submissions. It's not clear to me that there is any issue here unless a better conversion from the source ontology is desired. And. I am afraid this would require revising the script and preparing all affected submissions over again with improved compound.tsv content. The results would likely be unsatisfactory if we continue to get ugly compound names in the submission system/release constituents, as there is no mechanism to indicate which value is "correct"...
This is not broken formatting, it's just how PubChem names their compounds.
The name for many compounds is have formatting issues. It seems like all of these compounds were submitted by GlyGen. There are a lot of compound entries, so it's hard to tell if all GlyGen entries are broken or just some.
If possible, we should remove the broken names before we release this round of data.