rcsb / symmetry

:ferris_wheel: Detect, analyze, and visualize protein symmetry
GNU Lesser General Public License v2.1
26 stars 16 forks source link

CE-Symm fails with 'does not look like a valid mmCIF file!' errors #91

Closed sbliven closed 6 years ago

sbliven commented 7 years ago

Using the latest version gives a lot of error messages like this:

2411 [pool-2-thread-1] ERROR org.biojava.nbio.structure.io.mmcif.SimpleMMcifParser - This does not look like a valid mmCIF file! The first line should start with 'data_', but is: 'null'

Followed by a NPE:

2497 [pool-2-thread-1] ERROR workers.CeSymmWorker - Could not complete job: 4hhb
java.lang.NullPointerException
    at org.biojava.nbio.structure.io.ChargeAdder.addCharges(ChargeAdder.java:55) ~[cesymm-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at org.biojava.nbio.structure.io.mmcif.SimpleMMcifConsumer.addCharges(SimpleMMcifConsumer.java:995) ~[cesymm-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at org.biojava.nbio.structure.io.mmcif.SimpleMMcifConsumer.documentEnd(SimpleMMcifConsumer.java:799) ~[cesymm-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at org.biojava.nbio.structure.io.mmcif.SimpleMMcifParser.triggerDocumentEnd(SimpleMMcifParser.java:1180) ~[cesymm-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at org.biojava.nbio.structure.io.mmcif.SimpleMMcifParser.parse(SimpleMMcifParser.java:395) ~[cesymm-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at org.biojava.nbio.structure.io.MMCIFFileReader.getStructure(MMCIFFileReader.java:111) ~[cesymm-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at org.biojava.nbio.structure.io.LocalPDBDirectory.getStructureById(LocalPDBDirectory.java:345) ~[cesymm-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at org.biojava.nbio.structure.align.util.AtomCache.loadStructureFromCifByPdbId(AtomCache.java:1048) ~[cesymm-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at org.biojava.nbio.structure.align.util.AtomCache.getStructureForPdbId(AtomCache.java:1029) ~[cesymm-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at org.biojava.nbio.structure.SubstructureIdentifier.loadStructure(SubstructureIdentifier.java:316) ~[cesymm-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at org.biojava.nbio.structure.align.client.StructureName.loadStructure(StructureName.java:549) ~[cesymm-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at org.biojava.nbio.structure.align.util.AtomCache.getStructure(AtomCache.java:424) ~[cesymm-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at workers.CeSymmWorker.run(CeSymmWorker.java:58) [cesymm-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

This is a known issue that was patched in BioJava (https://github.com/biojava/biojava/pull/464 and https://github.com/biojava/biojava/pull/683) following changes in the RCSB URLs.

josemduarte commented 7 years ago

The solution is just upgrading to biojava 4.2.8 (and purging the chemcomp dir).

sbliven commented 7 years ago

Biojava is updated in ecfea93 (RC3). I'd still like to add a patch to auto-purge bad files.

sbliven commented 6 years ago

Fixed with biojava/biojava#774