statisticalbiotechnology / maracluster

Matthew The's implementation of MaRaCluster
Apache License 2.0
11 stars 3 forks source link

Warning: index 1000021 out of bounds: #5

Closed BioComSoftware closed 7 years ago

BioComSoftware commented 7 years ago

I'm running maracluster through a Python script (thus the formatted log output). MGF files are going in.

The issue is if I set maracluster to output to .ms2, everything works finr...but if I set maracluster to output .mgf files, I get the following repetitive errors.

Everything else is the same. No successful *.mgf out files are written.

LOG OUTPUT =================================================================== 2016-11-05 03:09:27,704 - Workflow - DEBUG - Batch command: ['maracluster', 'batch', '-b', '/tmp//LIST', '-f', '/tmp//clusters/', '-a', 'sample', '-t', '-10', '-c', '-10'] 2016-11-05 03:09:27,704 - Workflow - DEBUG - Running maracluster batch command: ['maracluster', 'batch', '-b', '/tmp//LIST', '-f', '/tmp//clusters/', '-a', 'sample', '-t', '-10', '-c', '-10'] 2016-11-05 03:09:27,708 - Workflow - DEBUG - RunSubprocess: ['maracluster', 'batch', '-b', '/tmp//LIST', '-f', '/tmp//clusters/', '-a', 'sample', '-t', '-10', '-c', '-10'] 2016-11-05 03:09:27,708 - Workflow - DEBUG - RunSubprocess:Issued command: 2016-11-05 03:09:27,708 - Workflow - DEBUG - RunSubprocess:maracluster batch -b /tmp//LIST -f /tmp//clusters/ -a sample -t -10 -c -10 2016-11-05 03:09:27,708 - Workflow - DEBUG - RunSubprocess:Started Sat Nov 5 03:09:27 2016 2016-11-05 03:09:27,708 - Workflow - DEBUG - RunSubprocess:Read dat-files from /tmp//clusters//sample.dat_file_list.txt. Remove this file to generate new dat-files. 2016-11-05 03:09:27,709 - Workflow - DEBUG - RunSubprocess:Read scan numbers from /tmp//clusters//sample.scannrs.dat. Remove this file to generate a new scannr list. 2016-11-05 03:09:27,750 - Workflow - DEBUG - RunSubprocess:Using p-values from /tmp//clusters//799.dat.pvalues.dat. Remove this file to generate new p-values. 2016-11-05 03:09:27,760 - Workflow - DEBUG - RunSubprocess:Previous clustering results are available in /tmp//clusters//sample.pvalue_tree.tsv. Remove this file to redo the clustering. 2016-11-05 03:09:27,760 - Workflow - DEBUG - RunSubprocess:Reading in p-value tree. 2016-11-05 03:09:27,769 - Workflow - DEBUG - RunSubprocess:Reading in scan descriptions. 2016-11-05 03:09:27,769 - Workflow - DEBUG - RunSubprocess:WARNING: Could not find scan desc file. 2016-11-05 03:09:27,769 - Workflow - DEBUG - RunSubprocess:Writing clusterings for 1 thresholds. 2016-11-05 03:09:27,792 - Workflow - DEBUG - RunSubprocess:Writing clustering to /tmp//clusters//sample.clusters_p10.tsv 2016-11-05 03:09:27,831 - Workflow - DEBUG - RunSubprocess:Finished writing clusterings. 2016-11-05 03:09:27,839 - Workflow - DEBUG - RunSubprocess:Running MaRaCluster took: 0.12 cpu seconds or 0 seconds wall time 2016-11-05 03:09:27,839 - Workflow - INFO - Finished maracluster batch run. 2016-11-05 03:09:27,840 - Workflow - DEBUG - Starting maracluster concensus run... 2016-11-05 03:09:27,840 - Workflow - DEBUG - filename = sample.clusters_p20.tsv 2016-11-05 03:09:27,840 - Workflow - DEBUG - Running maracluster concensus command: ['maracluster', 'consensus', '-l', '/tmp//clusters/sample.clusters_p20.tsv', '-f', '/tmp//concensus/', '-o', '/tmp//concensus/sample.clusters_p20.mgf'] 2016-11-05 03:09:27,841 - Workflow - DEBUG - RunSubprocess: ['maracluster', 'consensus', '-l', '/tmp//clusters/sample.clusters_p20.tsv', '-f', '/tmp//concensus/', '-o', '/tmp//concensus/sample.clusters_p20.mgf'] 2016-11-05 03:09:27,843 - Workflow - DEBUG - RunSubprocess:Issued command: 2016-11-05 03:09:27,843 - Workflow - DEBUG - RunSubprocess:maracluster consensus -l /tmp//clusters/sample.clusters_p20.tsv -f /tmp//concensus/ -o /tmp//concensus/sample.clusters_p20.mgf 2016-11-05 03:09:27,843 - Workflow - DEBUG - RunSubprocess:Started Sat Nov 5 03:09:27 2016 2016-11-05 03:09:27,844 - Workflow - DEBUG - RunSubprocess:Parsing cluster file 2016-11-05 03:09:27,882 - Workflow - DEBUG - RunSubprocess:Finished parsing cluster file 2016-11-05 03:09:27,882 - Workflow - DEBUG - RunSubprocess:Merging clusters 2016-11-05 03:09:27,888 - Workflow - DEBUG - RunSubprocess:Splitting /home/mikes/TESTDATA/TOPPAS-OUTFILES/FileConverter-out/160812_29.mgf (33%) 2016-11-05 03:09:27,888 - Workflow - DEBUG - RunSubprocess:Splitting /home/mikes/TESTDATA/TOPPAS-OUTFILES/FileConverter-out/160812_30.mgf (100%) 2016-11-05 03:09:29,164 - Workflow - DEBUG - RunSubprocess:Writing file /tmp//concensus/sample.clusters_p20.part0_0.mgf 2016-11-05 03:09:31,430 - Workflow - DEBUG - RunSubprocess:Finished writing file /tmp//concensus/sample.clusters_p20.part0_0.mgf 2016-11-05 03:09:31,507 - Workflow - DEBUG - RunSubprocess:Finished splitting ms2 files 2016-11-05 03:09:31,508 - Workflow - DEBUG - RunSubprocess:Merging spectra in bin 1/1 2016-11-05 03:09:31,739 - Workflow - DEBUG - RunSubprocess: Warning: index 1000021 out of bounds: /home/mikes/TESTDATA/TOPPAS-OUTFILES/FileConverter-out/160812_29.mgf: 0 21 2016-11-05 03:09:31,739 - Workflow - DEBUG - RunSubprocess: Warning: index 1000022 out of bounds: /home/mikes/TESTDATA/TOPPAS-OUTFILES/FileConverter-out/160812_29.mgf: 0 22 2016-11-05 03:09:31,739 - Workflow - DEBUG - RunSubprocess: Warning: index 1000045 out of bounds: /home/mikes/TESTDATA/TOPPAS-OUTFILES/FileConverter-out/160812_29.mgf: 0 45 2016-11-05 03:09:31,739 - Workflow - DEBUG - RunSubprocess: Warning: index 1000046 out of bounds: /home/mikes/TESTDATA/TOPPAS-OUTFILES/FileConverter-out/160812_29.mgf: 0 46 2016-11-05 03:09:31,740 - Workflow - DEBUG - RunSubprocess: Warning: index 1000049 out of bounds: /home/mikes/TESTDATA/TOPPAS-OUTFILES/FileConverter-out/160812_29.mgf: 0 49 2016-11-05 03:09:31,740 - Workflow - DEBUG - RunSubprocess: Warning: index 1000050 out of bounds: /home/mikes/TESTDATA/TOPPAS-OUTFILES/FileConverter-out/160812_29.mgf: 0 50 2016-11-05 03:09:31,740 - Workflow - DEBUG - RunSubprocess: Warning: index 1000053 out of bounds: /home/mikes/TESTDATA/TOPPAS-OUTFILES/FileConverter-out/160812_29.mgf: 0 53

(...AND ON THROUGHOUT THE ENTIRE FILE...).

MGF FILES IN ======= Archive 2.zip

MatthewThe commented 7 years ago

I'll take a look at this. Could you perhaps also send the tab separated file with clusters (/tmp//clusters/sample.clusters_p20.tsv), that would make debugging a bit easier.

EDIT: never mind my request for the tab separated cluster file, I managed to reproduce the error by executing the same commands. I'm not sure what's going on yet, as it seems not even the sample.clusters_p20.part0_0.mgf is created, but I'll keep you posted.

MatthewThe commented 7 years ago

There was a problem with overwriting the TITLE attribute for mgf spectra, which apparently had some redundancies. It should be fixed now in the master branch, do you need a patched binary or are you building from source?

BioComSoftware commented 7 years ago

Hi Matthew,

Thanks for the fix. If possible, I'd take a binary over building...but I can build if needed.

MatthewThe commented 7 years ago

Okay, I just created a new binary release (v0.02.1), let me know if it works!

BioComSoftware commented 7 years ago

Trying it now. Thx

BioComSoftware commented 7 years ago

Its creating the MGF files without error. Doing an examination of them to make sure they have the correct values, but I think we're good. Thanks!