wkumler / RaMS

R-based access to Mass-Spectrometry data
Other
20 stars 7 forks source link

Feature request: grab "ion injection time" #2

Closed YonghuiDong closed 2 years ago

YonghuiDong commented 3 years ago

Hi @wkumler,

Thanks for the nice package, it is very user-friendly and fast.

I am wondering if you could include a function to grab ion injection time?

I understand that such parameter is not universally present for all types of MS files. It could be useful for some, i.e., files generated by Orbitrap.

Thanks again.

Dong

wkumler commented 3 years ago

Hi Dong,

That's a good idea but I'm not sure how best to implement it while keeping object sizes small. Each scan has its own ion injection time, and with the current format there's no easy way to store individual scan metadata without duplicating it for every single data point (like we do with RT). I do like your idea of exposing a few more functions within the package - the xml2 code is pretty robust and I can imagine providing a general xml2 parser that would allow the extraction of arbitrary metadata like ion injection time. The other option would be to include the extra scan metadata in the BPC or TIC slots, since those only have a single entry for each scan.

I'm headed out into the field next week and will be gone until mid-August, so I probably won't be able to work on this for a while. In the meantime, here's a small bit of code that should do the trick for you.

# Find the file you're interested in 
# Can only handle one at a time, so you'll have to loop if you've got multiple that you want iit for
# Include full path or make sure the file exists in your working directory
filename <- "170706_Blk_Blk0p2_1.mzML"

# Read in the mzML document with xml2
xml_data <- xml2::read_xml(filename)

# Extract the scan nodes that have the ion injection time values in them
iit_nodes <- xml2::xml_find_all(xml_data, '//d1:cvParam[@name="ion injection time"]')

# Extract the actual values from the nodes
iit_vals <- as.numeric(xml2::xml_attr(iit_nodes, "value"))

This snippet worked nicely on a random mzML file I've got around, but won't work for mzXML files or those without the ion injection time cvParam.

YonghuiDong commented 3 years ago

Hi William,

Thanks a lot for your help and code. It is very helpful.

It is a nice idea to include a general xml2 parser to allow the user to extract the arbitrary metadata of interest.

Thanks again for your help.

Dong

wkumler commented 2 years ago

Hi @YonghuiDong, I've just released version 1.1.0 to GitHub main which includes a function to extract arbitrary metadata (grabAccessionData) by accession number. Thanks for the idea!

This will probably stay on GitHub for a couple weeks to check stability before I push it to CRAN.

YonghuiDong commented 2 years ago

@wkumler Hi William, Thanks a lot .

I have been following your updates, Version 1.1.0 seems very interesting, I will be very happy to test it. I saw that you have added a minification function to shrink the data size. I am wondering if it is possible to reduced the overall data size by ignoring "noises" when reading the files, i.e., adding a noise level parameter in grabMSdata function for MS1, if the intensity value is smaller than the user defined noise level, this MS1 peak will not be grabbed from the raw data. This could be helpful to largely reduce the data size (maybe also memory usage?).

Thanks very much again for this excellent package.

Dong

wkumler commented 2 years ago

Great! Glad it looks useful. Version 1.1.0 also introduced a prefilter argument that sounds like it may be what you're interested in — data points with intensity values below the value you provide to prefilter are removed when grabbing the data. It's an interesting idea to do this during the minification step instead though, I'll have to think more on the relative advantages and disadvantages of such. In the meantime, you can also use Proteowizard's msconvert to perform a similar function on the files with a command like:

msconvert [files] --filter threshold absolute 1000

which should remove the data points with absolute intensities below 1000.

YonghuiDong commented 2 years ago

@wkumler

Thanks for your prompt reply! Will you consider publishing your package in a scientific journal? I wrote an R shiny app based on your package for raw data quality evaluation. I will be happy to cite your package.

Dong

wkumler commented 2 years ago

I appreciate you citing the package, and I've actually got a manuscript pending with the R Journal right now that discusses the package and other MS data considerations! Until that's accepted and published, however, feel free to use the output from citation("RaMS").