medbioinf / pia

:books: :microscope: PIA - Protein Inference Algorithms
https://github.com/medbioinf/pia
Other
22 stars 9 forks source link

NullPointerException while compiling .mzid file #207

Closed kretep closed 4 months ago

kretep commented 4 months ago

While trying to compile for a .mzid file (output.mzid.zip), the NullPointerException below occurs. Apparently because the "start" and "end" attribute are missing from the PeptideEvidence.

java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because "start" is null
        at de.mpc.pia.intermediate.compiler.parser.MzIdentMLFileParser.getPeptideEvidenceSequence(MzIdentMLFileParser.java:741)
        at de.mpc.pia.intermediate.compiler.parser.MzIdentMLFileParser.parseSIIPeptideEvidences(MzIdentMLFileParser.java:696)
        at de.mpc.pia.intermediate.compiler.parser.MzIdentMLFileParser.processSpectrumIdentificationItem(MzIdentMLFileParser.java:575)
        at de.mpc.pia.intermediate.compiler.parser.MzIdentMLFileParser.addSpectrumIdentificationResult(MzIdentMLFileParser.java:449)
        at de.mpc.pia.intermediate.compiler.parser.MzIdentMLFileParser.addSpectrumIdentificationList(MzIdentMLFileParser.java:413)
        at de.mpc.pia.intermediate.compiler.parser.MzIdentMLFileParser.parseFile(MzIdentMLFileParser.java:234)
        at de.mpc.pia.intermediate.compiler.parser.MzIdentMLFileParser.getDataFromMzIdentMLFile(MzIdentMLFileParser.java:94)
        at de.mpc.pia.intermediate.compiler.parser.InputFileParserFactory$InputFileTypes$7.parseFile(InputFileParserFactory.java:248)
        at de.mpc.pia.intermediate.compiler.parser.InputFileParserFactory.getDataFromFile(InputFileParserFactory.java:489)
        at de.mpc.pia.intermediate.compiler.PIACompiler.getDataFromFile(PIACompiler.java:246)
        at de.mpc.pia.PIACli.parseCommandLineInfile(PIACli.java:158)
        at de.mpc.pia.PIACli.parseCommandLineInfiles(PIACli.java:124)
        at de.mpc.pia.PIACli.processCompile(PIACli.java:94)
        at de.mpc.pia.PIACli.run(PIACli.java:66)
        at picocli.CommandLine.executeUserObject(CommandLine.java:2030)
        at picocli.CommandLine.access$1500(CommandLine.java:148)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
        at picocli.CommandLine.execute(CommandLine.java:2174)
        at de.mpc.pia.PIACli.main(PIACli.java:82)
magnuspalmblad commented 4 months ago

I got a little further, once I figured out how to use the command-line:

C:\TPP\data>java -jar c:\Users\nmpalmblad\Desktop\pia-1.5.2\pia-1.5.2.jar --compile "140131.LC2.IT2.XX.P01347_2-C,6_01_5970.mzid" -o pia-compilation.xml
2024-07-10 16:43:38 INFO  '140131.LC2.IT2.XX.P01347_2-C,6_01_5970.mzid' seems to be a mzIdentML file (de.mpc.pia.intermediate.compiler.parser.InputFileParserFactory:488)
2024-07-10 16:43:38 WARN  MzIdentML Configuration file: jar:file:/C:/Users/nmpalmblad/Desktop/pia-1.5.2/pia-1.5.2.jar!/MzIdentMLElement.cfg.xml (uk.ac.ebi.jmzidml.MzIdentMLElement:1101)
2024-07-10 16:44:08 INFO  inserted new:
        54605 peptides
        68365 peptide spectrum matches
        30930 accessions (de.mpc.pia.intermediate.compiler.parser.MzIdentMLFileParser:241)
2024-07-10 16:44:08 INFO  have now:
        54605 peptides
        68365 peptide spectrum matches
        30930 accessions (de.mpc.pia.intermediate.compiler.PIACompiler:252)
2024-07-10 16:44:08 INFO  start sorting clusters (de.mpc.pia.intermediate.compiler.PIACompiler:668)
2024-07-10 16:44:08 INFO  clusters sorted: 28201 (de.mpc.pia.intermediate.compiler.PIACompiler:691)
2024-07-10 16:44:08 INFO  Using 8 threads. (de.mpc.pia.intermediate.compiler.PIACompiler:774)
2024-07-10 16:44:08 INFO   <thread 7 has no more work after 4269 clusters>  (de.mpc.pia.intermediate.compiler.CompilerWorkerThread:76)
2024-07-10 16:44:08 INFO   <thread 6 has no more work after 2593 clusters>  (de.mpc.pia.intermediate.compiler.CompilerWorkerThread:76)
2024-07-10 16:44:08 INFO   <thread 2 has no more work after 3428 clusters>  (de.mpc.pia.intermediate.compiler.CompilerWorkerThread:76)
2024-07-10 16:44:08 INFO   <thread 1 has no more work after 4121 clusters>  (de.mpc.pia.intermediate.compiler.CompilerWorkerThread:76)
2024-07-10 16:44:08 INFO   <thread 3 has no more work after 3292 clusters>  (de.mpc.pia.intermediate.compiler.CompilerWorkerThread:76)
2024-07-10 16:44:08 INFO   <thread 5 has no more work after 3210 clusters>  (de.mpc.pia.intermediate.compiler.CompilerWorkerThread:76)
2024-07-10 16:44:08 INFO   <thread 8 has no more work after 4450 clusters>  (de.mpc.pia.intermediate.compiler.CompilerWorkerThread:76)
2024-07-10 16:44:08 INFO   <thread 4 has no more work after 2838 clusters>  (de.mpc.pia.intermediate.compiler.CompilerWorkerThread:76)
2024-07-10 16:44:08 INFO  Writing PIA XML file to C:\TPP\data\pia-compilation.xml (de.mpc.pia.intermediate.compiler.PIACompiler:909)
2024-07-10 16:44:08 INFO  Stream open, writing PIA XML (de.mpc.pia.intermediate.compiler.PIACompiler:933)
2024-07-10 16:44:11 INFO  Writing of PIA XML file finished. (de.mpc.pia.intermediate.compiler.PIACompiler:1007)

C:\TPP\data>sed -e "s/\/tmp\///g" configuration.json > configuration_Windows.json

C:\TPP\data>java -jar c:\Users\nmpalmblad\Desktop\pia-1.5.2\pia-1.5.2.jar configuration_Windows.json pia-compilation.xml
2024-07-10 16:45:03 INFO  start loading file pia-compilation.xml (de.mpc.pia.modeller.PIAModeller:145)
2024-07-10 16:45:03 INFO  loadIntermediate started... (de.mpc.pia.modeller.PIAModeller:258)
2024-07-10 16:45:03 WARN  No progress array given, creating one. But no external supervision will be possible. (de.mpc.pia.modeller.PIAModeller:262)
2024-07-10 16:45:03 INFO  Starting parse... (de.mpc.pia.modeller.PIAModeller:276)
2024-07-10 16:45:03 INFO  filesList (de.mpc.pia.intermediate.xmlhandler.PIAIntermediateJAXBHandler:253)
2024-07-10 16:45:04 INFO  Inputs (de.mpc.pia.intermediate.xmlhandler.PIAIntermediateJAXBHandler:258)
2024-07-10 16:45:04 INFO  AnalysisSoftwareList (de.mpc.pia.intermediate.xmlhandler.PIAIntermediateJAXBHandler:263)
2024-07-10 16:45:04 INFO  spectraList (de.mpc.pia.intermediate.xmlhandler.PIAIntermediateJAXBHandler:268)
2024-07-10 16:45:06 INFO  accessionsList (de.mpc.pia.intermediate.xmlhandler.PIAIntermediateJAXBHandler:272)
2024-07-10 16:45:07 INFO  peptidesList (de.mpc.pia.intermediate.xmlhandler.PIAIntermediateJAXBHandler:276)
2024-07-10 16:45:07 INFO  groupsList (de.mpc.pia.intermediate.xmlhandler.PIAIntermediateJAXBHandler:280)
2024-07-10 16:45:08 INFO  pia-compilation.xml successfully parsed.
         1 files
         30482 groups
         30930 accessions
         54605 peptides
         68365 peptide spectrum matches
         28201 trees (de.mpc.pia.modeller.PIAModeller:281)
2024-07-10 16:45:08 INFO  loadIntermediate done. (de.mpc.pia.modeller.PIAModeller:295)
2024-07-10 16:45:08 INFO  setting spectra uniquenesses (de.mpc.pia.modeller.PIAModeller:325)
2024-07-10 16:45:08 INFO  spectra uniquenesses set. (de.mpc.pia.modeller.PIAModeller:331)
2024-07-10 16:45:08 INFO  createReportPSMsFromGroups started... (de.mpc.pia.modeller.PSMModeller:313)
2024-07-10 16:45:09 INFO  createReportPSMSets done (de.mpc.pia.modeller.PSMModeller:611)
2024-07-10 16:45:09 INFO  createReportPSMsFromGroups done. (de.mpc.pia.modeller.PSMModeller:574)
2024-07-10 16:45:09 ERROR Problem with the parsing of the JSON file (de.mpc.pia.JsonAnalysis:79)
com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $
See https://github.com/google/gson/blob/main/Troubleshooting.md#unexpected-json-structure
        at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:520) ~[gson-2.11.0.jar:?]
        at com.google.gson.Gson.fromJson(Gson.java:1361) ~[gson-2.11.0.jar:?]
        at com.google.gson.Gson.fromJson(Gson.java:1262) ~[gson-2.11.0.jar:?]
        at com.google.gson.Gson.fromJson(Gson.java:1199) ~[gson-2.11.0.jar:?]
        at de.mpc.pia.JsonAnalysis.readFromFile(JsonAnalysis.java:75) [pia-1.5.2.jar:?]
        at de.mpc.pia.PIACli.processPIAAnalysis(PIACli.java:210) [pia-1.5.2.jar:?]
        at de.mpc.pia.PIACli.processAnalysis(PIACli.java:196) [pia-1.5.2.jar:?]
        at de.mpc.pia.PIACli.run(PIACli.java:70) [pia-1.5.2.jar:?]
        at picocli.CommandLine.executeUserObject(CommandLine.java:2030) [picocli-4.7.6.jar:4.7.6]
        at picocli.CommandLine.access$1500(CommandLine.java:148) [picocli-4.7.6.jar:4.7.6]
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) [picocli-4.7.6.jar:4.7.6]
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) [picocli-4.7.6.jar:4.7.6]
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) [picocli-4.7.6.jar:4.7.6]
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) [picocli-4.7.6.jar:4.7.6]
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) [picocli-4.7.6.jar:4.7.6]
        at picocli.CommandLine.execute(CommandLine.java:2174) [picocli-4.7.6.jar:4.7.6]
        at de.mpc.pia.PIACli.main(PIACli.java:82) [pia-1.5.2.jar:?]
Caused by: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $
See https://github.com/google/gson/blob/main/Troubleshooting.md#unexpected-json-structure
        at com.google.gson.stream.JsonReader.unexpectedTokenError(JsonReader.java:1768) ~[gson-2.11.0.jar:?]
        at com.google.gson.stream.JsonReader.beginObject(JsonReader.java:469) ~[gson-2.11.0.jar:?]
        at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:509) ~[gson-2.11.0.jar:?]
        ... 16 more
com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $
See https://github.com/google/gson/blob/main/Troubleshooting.md#unexpected-json-structure
        at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:520)
        at com.google.gson.Gson.fromJson(Gson.java:1361)
        at com.google.gson.Gson.fromJson(Gson.java:1262)
        at com.google.gson.Gson.fromJson(Gson.java:1199)
        at de.mpc.pia.JsonAnalysis.readFromFile(JsonAnalysis.java:75)
        at de.mpc.pia.PIACli.processPIAAnalysis(PIACli.java:210)
        at de.mpc.pia.PIACli.processAnalysis(PIACli.java:196)
        at de.mpc.pia.PIACli.run(PIACli.java:70)
        at picocli.CommandLine.executeUserObject(CommandLine.java:2030)
        at picocli.CommandLine.access$1500(CommandLine.java:148)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
        at picocli.CommandLine.execute(CommandLine.java:2174)
        at de.mpc.pia.PIACli.main(PIACli.java:82)
Caused by: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $
See https://github.com/google/gson/blob/main/Troubleshooting.md#unexpected-json-structure
        at com.google.gson.stream.JsonReader.unexpectedTokenError(JsonReader.java:1768)
        at com.google.gson.stream.JsonReader.beginObject(JsonReader.java:469)
        at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:509)
        ... 16 more

C:\TPP\data>

I seem to get the java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $ error no matter how I format the filenames. I have put the files on https://osf.io/j8rnk/.

julianu commented 4 months ago

For the comet file, could you try calling PIA like this:

java -jar pia-1.5.2.jar --compile -o /path/to/output.pia.xml input.mzid
magnuspalmblad commented 4 months ago

I realized the mistake was I did not remove the "Example for PIA analysis JSON:" from the example output. I was thinking it worked similarly to comet -p. Now it works, and produces output in my C:\tmp folder (the / vs \ did not matter)!

julianu commented 4 months ago

I just created a new release, which should also work with the MSAmanda mzid file. Could you please confirm it works (though there are few/no FDR valid IDs in there)?

magnuspalmblad commented 4 months ago

Yes, I can confirm that PIA 1.5.3 works with the mzIdentML output from MS Amanda 3.0.21.117. The decoy prefix in the PIA configuration file has to be changed to REV_ for MS Amanda. I have not analyzed the results in detail, but at least the three expected output files are created, and no errors were reported. I will try the latest version of MS Amanda next (which reports both the first and second pass search FASTAs in the mzIdentML even when only the first is searched).

magnuspalmblad commented 4 months ago

I can now also confirm PIA 1.5.3 works with the mzIdentML output from MS Amanda 3.0.21.532. Again, I have not evaluated the results in detail, but the three expected files are created, and are non-empty.

julianu commented 4 months ago

Ok, then I will close this issue. If you encounter problems with the analysis, feel free to open another issue.