lgatto / MSnbase

Base Classes and Functions for Mass Spectrometry and Proteomics
http://lgatto.github.io/MSnbase/
123 stars 50 forks source link

Scan index in writeMgfData and readMgfData #529

Closed plbaldoni closed 3 years ago

plbaldoni commented 3 years ago

Hi there,

Is there any reason for writeMgfData writing the scan index from an MSnExp object as acquisitionNum(sp):

https://github.com/lgatto/MSnbase/blob/2ba32a020c3ff3b3df2c85230d49b8af6bc53a13/R/readWriteMgfData.R#L73-L80

and for readMgfData properly reading the scan index while leaving the acquisition number as NA_integer_?

https://github.com/lgatto/MSnbase/blob/2ba32a020c3ff3b3df2c85230d49b8af6bc53a13/R/readWriteMgfData.R#L227-L235

My question is coming from here, as I noticed the missing scan values when writing .mgf files.

Thanks for developing and maintaining MSnbase.

Best, Pedro

lgatto commented 3 years ago

Indeed, there is a lack of consistency here.

The typical use case as it was envisioned in MSnbase was to use mzML files (in which case there is an acquisition number) and export to mgf when needed (for example to search against a database).

If there is consensus (ping @sgibb @jorainer), I would be open to accept a pull request to populate the acquisition number with the SCAN when reading mgf files.

In your case, in theory, the easiest would be to populate the acquisition number with the SCAN number, but this isn't straightforward for technical reasons. If you really need it, we could try to find a workaround.

plbaldoni commented 3 years ago

Thanks, Laurent.

My goal was to use MSnbase to quickly manipulate (filtering and sorting) my mgf files, which unfortunately is the format I have at hand. A tool that I am trying to use complains about mgf files not being sorted with respect to the scan number, and I could not yet find a more efficient way to do this than using sed, awk, etc.

lgatto commented 3 years ago

Install

BiocManager::install("lgatto/MSnbase", ref = "issue259")

and when you read mgf files, use

readMgfData("file.mgf", scanToAcquisitionNum = TRUE)

which should assign the acquisition number and allow you to write it back to mgf.

plbaldoni commented 3 years ago

Thanks Laurent, that did the trick!