sneumann / mzR

This is the git repository matching the Bioconductor package mzR: parser for netCDF, mzXML, mzData and mzML files (mass spectrometry data)
40 stars 26 forks source link

header() for netCDF giving incomplete information for writing mzML #203

Open cbielow opened 4 years ago

cbielow commented 4 years ago

when trying to convert from netCDF to mzML I would expect the following to work (but it does not):

> d = openMSfile("some.CDF", backend="netCDF")
> p = peaks(d)
> writeMSData(p, "fromCDF.mzML", header(d))
Error in .local(object, file, ...) : 
  Error checking parameter 'header': 'x' is missing one or more required columns: seqNum, acquisitionNum, msLevel, polarity, peaksCount, totIonCurrent, retentionTime, basePeakMZ, basePeakIntensity, collisionEnergy, ionisationEnergy, lowMZ, highMZ, precursorScanNum, precursorMZ, precursorCharge, precursorIntensity, mergedScan, mergedResultScanNum, mergedResultStartScanNum, mergedResultEndScanNum, injectionTime

In order to fix it, I have to add two columns, whose names have to be figured out "manually":

> h = header(d)
> h$lowMZ = 100
> h$polarity = 0
> writeMSData(p, "fromCDF.mzML", h)

Not sure if polarity can be inferred from the netCDF format, but even if not, maybe there is a unknown representation. In any case, the exact missing column names would be nice to have in case there is a regression.

sneumann commented 4 years ago

Hi, unfortunately, there is no "unknown" that can be used as polarity: https://www.ebi.ac.uk/ols/ontologies/ms/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FMS_1000465

Some CDF files seem to have polarity stored, e.g. faahKO KO16.cdf is: :test_ionization_polarity = "Positive Polarity" ; but I am unsure if that works for all.

So the minimum we should do as fix is to improve the error message " 'x' is missing one or more required columns: ..." to say which ones in https://github.com/sneumann/mzR/blob/753bf97e5eb20854156740ba5a06f3ac00754c96/R/functions-utils.R#L56

Yours, Steffen