singa-bio / singa

:leaves: SiNGA (Simulation of Natural Systems using Graph Automata) is an open-source library containing tools especially for structural bioinformatics and systems biology.
MIT License
8 stars 2 forks source link

RCSB CIF file parsing fails #94

Closed fkaiserbio closed 4 years ago

fkaiserbio commented 4 years ago

It seems that RCSB has updated the definition for CIF files. Multi line entries might occur, see e.g.: http://files.rcsb.org/ligands/view/5MU.cif

_chem_comp.id                                    5MU 
_chem_comp.name                                  
;5-METHYLURIDINE 5'-MONOPHOSPHATE
;
_chem_comp.type                                  "RNA LINKING" 

This breaks a lot of structure parsing functionality. Key is the method CifFileParser#extractValue and how the parser extracts values from CIF files in general.

cleberecht commented 4 years ago

Thanks for pointing that out. I will have a look at it.

cleberecht commented 4 years ago

Exceptions be fixed for now, but there might still be problems with multi line names of ligands. AFAIK the name of ligands is not required for processing so this will be fixed later.

JonStargaryen commented 4 years ago

Just use ciftools-java for all your parsing needs :p

cleberecht commented 4 years ago

Hey Sebastian, we will properly adopt ciftools in an upcoming release for mmcif file parsing and also migrate current ligand handling. The latest commit to development should hotfix the problem for now. All the Best :)