statisticalbiotechnology / maracluster

Matthew The's implementation of MaRaCluster
Apache License 2.0
11 stars 3 forks source link

Warning: index out of bounds for MGF Files (again) #19

Closed Luxxii closed 4 years ago

Luxxii commented 4 years ago

I have some Problems generating Consensus-Files from MGF Files. It is the same problem as stated in #5 .

I am currently working with a toy_set to try out MaRaCluster. I have built MaRaCluster from source (master-branch) and have tried the binaries provided in releases (Version 1.01).

The Clustering via maracluster batch works well and exits without an error. But running maracluster consensus i get the following output:

MaRaCluster version 1.01.0, Build Date Jan 28 2020 09:10:12
Copyright (c) 2015-19 Matthew The. All rights reserved.
Written by Matthew The (matthewt@kth.se) in the
School of Biotechnology at the Royal Institute of Technology in Stockholm.
Issued command:
maracluster consensus --output-folder /DATA_OUTPUT/ --specOut /DATA_OUTPUT/consensus.mgf -l /DATA_OUTPUT/MaRaCluster.clusters_p5.tsv
Started Tue Apr 21 18:49:14 2020
 on f777c9bdbd97
Parsing cluster file
Finished parsing cluster file
Merging clusters
Splitting /DATA_INPUT/toy_set.mgf (50%)
Splitting /DATA_INPUT/sub_folder/toy_set.mgf (100%)
Writing file /DATA_OUTPUT/consensus.part0_0_tmpfile.mgf
Finished writing file /DATA_OUTPUT/consensus.part0_0_tmpfile.mgf
Finished splitting ms2 files
Merging spectra in bin 1/1
[SpectrumList::find]: mismatch between spectrum id format of the file (index=0) and the looked-up id (scan=2000000)
  Warning: index 2000000 out of bounds: /DATA_INPUT/toy_set.mgf: 1 0
  Warning: index 3000000 out of bounds: /DATA_INPUT/sub_folder/toy_set.mgf: 2 0
  Warning: index 2000002 out of bounds: /DATA_INPUT/toy_set.mgf: 1 2
  Warning: index 3000002 out of bounds: /DATA_INPUT/sub_folder/toy_set.mgf: 2 2
  Warning: index 2000003 out of bounds: /DATA_INPUT/toy_set.mgf: 1 3
  Warning: index 3000003 out of bounds: /DATA_INPUT/sub_folder/toy_set.mgf: 2 3
  . . . (continues with more warnings)
  Processing consensus spectra.
  Batch 1/1
  Merging spectra
Finished merging clusters

The File consensus.mgf is afterwards NOT present in the Folder /DATA_OUTPUT


I tried to add a SCANS= attribute to the MGF FIles and even added scan= and index= to the TITLE=, which i saw on some previous issues here (#7, #5 (Archive 2.zip) and #2 ). The consensus file still does not get generated.

I am accessing the MaRaCluster binary via command line indirectly through Python.

Here are the files i used (without added attributes) + the genereated tsv file: files.zip

Greetings!

MatthewThe commented 4 years ago

Thanks for reporting this, I can indeed confirm that the mgf pipeline is broken at the moment. I suspect it's because of the changes I made here f59f9d005e880f3a93d69fb36e6e3911c837d567. I will look into it.

MatthewThe commented 4 years ago

Reverting the previous change solved the problem on my system. Please let me know if the problem persists on your end.

Luxxii commented 4 years ago

Hi, thank you for such a quick response.

I tried it out and the MGF-Files are generated. It is fixed on my end at least!