rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
38 stars 25 forks source link

MS-imaging support / MsBackendImzML #322

Open RogerGinBer opened 6 months ago

RogerGinBer commented 6 months ago

Hi there! In our research group we are actively working with MS-imaging (MSI) data and I was wondering about how to properly name and add the spatial coordinates information to a MsBackend object and whether we should also define generics for them in Spectra. Basically, for each MSI spectra, you have a pair of x and y coordinates, so it seems natural to add them as spectraData in two separate columns

For context, I recently created a new backend (WIP) for imzML/ibd files (https://github.com/RogerGinBer/MsBackendImzML), which parses imzML files and saves each scan information (similar to header in mzR) and offsets in a DFrame as SpectraData. The actual peaksData is retrieved on-demand from the ibd files.

Provisionally, I'm saving the x and y coordinates as xPixel and yPixel (because x and y were too generic IMO), but since this information may also appear in other backends that can/could handle MS-imaging data (thinking of MsBackendTimsTof, but also maybe MsBackendRawFileReader), I think I'd be nice if we could do some brainstorming about this

jorainer commented 6 months ago

Hey @RogerGinBer ! that sounds like a great backend!

hm, so, the x and y coordinates are per spectrum, right? so, like you said, you extract them and put them into the spectraData (and not the peaksData).

I am a bit hesitant adding additional core spectra variables if not absolutely needed - because all methods we have so far will by default initialize them with NA values (even if they are not present) and thus we are (already) blowing up the data with, essentially, missing values. Just having them as $xPixel and $yPixel custom spectra variables would not work?

For the names, I would use what is commonly used in the MS-imaging field - and maybe, to be consistent with the core spectra variables, use camelCase for the name. Maybe alternatives would/could be postitionX and positionY (or coordinateX, coordinateY) - changing from xPosition to positionX would IMHO make it easier to find these variables in the (alphabetically ordered) spectra variables. But I'm OK either way.

Another suggestion: from own experience I would suggest to use a data.frame instead of a DFrame if possible... subsetting etc would be much faster, especially for large data.

RogerGinBer commented 6 months ago

Just having them as $xPixel and $yPixel custom spectra variables would not work?

Most definitely! My issue here was mostly about agreeing about the variable names so we wouldn't have to change them in the future. I agree that they should definitely not be a core spectra variable. Regarding naming, positionX and positionY sound good to me 👍

Am I right to understand that there's no easy way to add new generics, getters and setters of backend-custom variables (ie. positionX, positionX<-, etc.) to the Spectra object class without adding those said variables to the list of core spectra variables? In any case, accessing the custom vars using SpectraData isn't difficult at all, so I'll probably just rename the variables and avoid overcomplicating simple stuff

Also, nice tip about data.frame vs DFrame, didn't know that

jorainer commented 6 months ago

OK, then let's go with positionX, positionY.

and it would of course be possible to add generics without defining core spectra variables to Spectra - and for your backend you can anyway define whatever generic/method or function you think is helpful/useful.

You think it would be important (or better/easier) in your analysis workflow to have a function or method for positionX(), positionY() for Spectra? or for the MsBackend?