lifs-tools / rmzTab-m

The R-language bindings for mzTab-M
https://lifs-tools.github.io/rmzTab-m/
MIT License
4 stars 1 forks source link

mzTab-M without identification data? #32

Open jorainer opened 7 months ago

jorainer commented 7 months ago

Hi @nilshoffmann @sneumann - we're having some difficult time to export valid mzTab-M files from xcms. Basically, we're not having any identification data and thus the small molecule summary (SML) part is empty. We tried to create an SML table with a single entry and null values, but that does no pass the validator. Thus our questions:

1) how should a mzTab-M file with only LC-MS feature abundances look like? 2) (related) what is the best obo term to use for small_molecule-quantification_unit? In xcms we have arbitrary (semi-quantitative) abundances - and setting this to null does not work.

Any insights here would be great - especially because the validator throws an error, but we don't quite understand how to solve the issue.

Pinging also @philouail

philouail commented 7 months ago

Thanks Johannes, just adding some details to this:

nilshoffmann commented 7 months ago

@jorainer @philouail Which validator are you using? We could think about relaxing the SML section requirement to enable usage of mzTab-M as an intermediate format.

philouail commented 7 months ago

This is the link to the validator: https://apps.lifs-tools.org/mztabvalidator/ would you know of another one ? I'm also worried that something is wrong with our file.

nilshoffmann commented 7 months ago

The URL is the right one. Leaving the summary and evidence tables out altogether will fail the parse. Would it be possible for you to share an example file with me so that I can give you direct feedback on it?

nilshoffmann commented 7 months ago

@jorainer Concerning semi quantitative abundances: small_molecule-quantification_unit only applies to the values in the SML table, while small_molecule_feature-quantification_unit applies to the values reported in the SMF table. In the default semantic validation mapping file (see https://github.com/HUPO-PSI/mzTab/blob/master/specification_document-releases/2_0-Metabolomics-Release/mzTab_2_0-M_mapping.xml, only applied if you use semantic validation mode), they can have cv terms that are children of any of the following root terms:

<CvTerm termAccession="PRIDE:0000392" useTerm="false" termName="Quantification unit" isRepeatable="false" allowChildren="true" cvIdentifierRef="PRIDE"></CvTerm>
<CvTerm termAccession="UO:0000051" useTerm="false" termName="concentration unit" isRepeatable="false" allowChildren="true" cvIdentifierRef="UO"></CvTerm>
<CvTerm termAccession="MS:1000043" useTerm="false" termName="intensity unit" isRepeatable="false" allowChildren="true" cvIdentifierRef="MS"></CvTerm>
<CvTerm termAccession="UO:0000006" useTerm="false" termName="substance unit" isRepeatable="false" allowChildren="true" cvIdentifierRef="UO"></CvTerm>

Children of PRIDE:0000392 Children of UO:0000051 Children of MS:1000043 Children of UO:0000006

For your particular use-case, following how mzML annotates "intensity" values, you should hopefully be able to use any of the children of MS:1000043. If non of them fit, could you check if the MS CV contains a more suitable term? We can then update the semantic validation mapping file.

philouail commented 7 months ago

Here is an example test.mztab.txt

I had to switch it to .txt because github does not support .mztab, if you want to original i can send it by email.

nilshoffmann commented 7 months ago

Thanks for the test file, I will check it and report back what we may do with minimal changes to the file. We may need to publish an amendment to the standard, if we decide to relax some requirements to strong recommendations. I will need to update the parser / validator implementation jmztab-m soon anyway, since the OLS 3 validation endpoint no longer appears to work for me. A development version of the web-based validator is now deployed at https://apps.lifs-tools.org/mztabvalidator-dev

philouail commented 7 months ago

Amazing thanks for the help !

nilshoffmann commented 7 months ago

Just for reference: the standard specification contains examples for the possible values in the different sections and elements for mzTab-M 2.0: https://hupo-psi.github.io/mzTab/2_0-metabolomics-release/mzTab_format_specification_2_0-M_release.html#format-specification

nilshoffmann commented 7 months ago

I have updated your file, which now passes validation (in principle): xcms-test-export.mztab.txt

Validation results (basic and semantic) with the current default mapping file are available here: https://apps.lifs-tools.org/mztabvalidator/result/a07569b1-3ba5-493b-95d7-bac34ce667b3

The errors shown are all only due to the semantic validation mapping file having required terms, we can create a custom xcms semantic validation mapping file to facilitate further adoption.

This is just a first draft, though. For now, I have added 1 SML entry that does not link to any grouped feature entry, but without abundances. Depending on the workflow, subsequent tools would be able to read the features, run an identification step and record the results in a new mzTab-M file that then contains a proper SML table. Please note that I added a charge of 1 in the SMF table to all features. This is the (net) charge (positive integer) of the ion / m/z. Not sure if your workflow allows determination of the charge of features and adducts at this level of analysis. If not, please let me know, we should be able to adapt how this is handled.

nilshoffmann commented 7 months ago

Validation result with the following relaxed semantic validation file yields only info level messages: https://apps.lifs-tools.org/mztabvalidator/result/bf180e3b-7e67-4fcb-8dc3-a008027d4d10

An adapted semantic validation mapping file for feature-only files based on XCMS is available here: https://github.com/HUPO-PSI/mzTab/blob/master/examples/2_0-Metabolomics_Release/mzTab_2_0-M_mapping-xcms.xml

jorainer commented 7 months ago

Am a bit late for the party ;)

Please note that I added a charge of 1 in the SMF table to all features. This is the (net) charge (positive integer) of the ion / m/z.

after xcms preprocessing we actually only have a feature table with abundances (semi-quantitative) - each feature being characterized by an m/z and retention time value. No additional information (like charge etc) are available at this stage (we would assume that most features have charge one - but we don't know). That, along with other information like adduct or compound annotation would needed to be added by a separate software further downstream in the analysis.

Don't know what's better here - changing the definition of mzTab-M or simply putting (like you did) some best-guess defaults.

For our test file, did I understand correctly that you had to tweak/change the validator to be able to read a mzTab-M without SML?

nilshoffmann commented 7 months ago

Understood, that would mean that we need to change charge to optional (nullable) in the spec and update the schema and validator implementation. I would not recommend to put in best guesses, that might lead to confusion about what is meant and without a clearly defined way of encoding this kind of information, people and tools will pick a way to interpret it.

For the example I provided two days ago, I did not adapt the validator, just your file and provided a different semantic mapping file. All linked further up in this thread. But to be able to validate your file without charges, we will need to alter the spec + schema + implementation.

philouail commented 7 months ago

Hi Nils,

Thanks for the feedback and the help. I completely agree with your perspective. I believe making certain elements nullable would facilitate a more intermediate style file format, which aligns better with xcms. Just wanted to summaries the points that need to be addressed to create this intermediate style format.

Regarding the mzTab recommendation found here: https://hupo-psi.github.io/mzTab/2_0-metabolomics-release/mzTab_format_specification_2_0-M_release.html#metadata-section and the validator. There are a few points to address:

What do you think ? I believe these requirements could probably be relaxed in both the file format definition and validator.

Furthermore:

I will adapt my code for some of the other changes that you made that make sense in the context of xcms. But in term of general structure, it was fine for you ? Would you rather we wait until the validator is adapted for us to publish this export method for xcms results?

philouail commented 7 months ago

Ideally this would be the intermediate file that xcms would provide: test.mztab.txt

How does this looks ? (mainly changed the metadata so it makes sense for an xcms output). do you want us to force the input of some other variable ? Also to be noted that we allow to pass optional column in the SMF as xcms can provide more information than the file format ask for. it would of course follow the required format of opt_column_name

sneumann commented 7 months ago

Hi @nilshoffmann, how are other softwares handling this ? We have examples from MS-Dial in gcms_tms_height, which also had to shoe-horn unidentified features into SML. I haven't found an mzMine3 example yet. Yours, Steffen

philouail commented 2 weeks ago

Hello @nilshoffmann coming back to you on this topic as I have some time for this again :)

I tried to pass the validator with our file here: test.mztab.txt

But it is crashing the validator again (even with the updated semantic validator file you uploaded above). https://apps.lifs-tools.org/mztabvalidator/result/6a171e34-8dc7-4bd8-a8bb-a23102c172fb

it works if I add molecule_quantification-.. info and add a "fake" charge to the SMF: test.mztab.tranformed.txt

I just wanted to reshare my suggestion stated earlier in order to have a validator and an mztab-m format suitable for xcms type of results:

Just wanted to summaries the points that need to be addressed to create this intermediate style format.

Regarding the mzTab recommendation found here: https://hupo-psi.github.io/mzTab/2_0-metabolomics-release/mzTab_format_specification_2_0-M_release.html#metadata-section and the validator. There are a few points to address:

  • _small_molecule-quantificationunit Although it's marked as mandatory, it seems unnecessary for xcms results if SML has no inputs.
  • _small_molecule-identificationreliability This was added to our file but is only mandatory in certain cases and not relevant for xcms results.
  • _id_confidencemeasure[1-n] Similarly, this is mandatory but not relevant for xcms outcomes.

What do you think ? I believe these requirements could probably be relaxed in both the file format definition and validator.

Furthermore:

  • reliability in SML: it's nullable in the mzTab file format definition, and it would make sense to change it to optional in the validator for xcms.
  • charge in SMF: Making it optional in the validator, as you suggested, is also necessary.

Is there any changes you need us to do otherwise on our mztam-m file output ? we'd love to have a common file format with other platform such a mzMine and mztabm seems to be a good solution.

nilshoffmann commented 2 weeks ago

@philouail Please try the development version of the validator at the moment: https://apps.lifs-tools.org/mztabvalidator-dev/

philouail commented 2 weeks ago

Nicve thanks ! It worked if I add manually:

The rest was all good :) I will add this to our output. The reliability in SML will be a random number though. Is it planned to remove the mandatory aspect of this one ?

nilshoffmann commented 2 weeks ago

I agree that these points are up to discussion to be changed in the spec to semi-mandatory (lacking a better term for it) in the absence of an SML table. Please note that these changes, after checking back with HUPO-PSI, will need some time to be officially released after review through the document processing workflow.

The current idea for the new format version 2.1.0-M would allow reporting of only M+S (Metadata and Summary) or M+F (Metadata and Features) or the current full format M+S+F+E (Metadata, Summary, Features, Evidence) as outlined in the following slide:

image

I am currently at EMBL-EBI with the MetaboLights team and we are also trying to identify and track further issues with the format for the next release version to enable conversion from mzTab-M to the MetaboLights ISA-Tab format.

philouail commented 2 weeks ago

Oh that is super news thanks for the info ! M+F format is perfect for xcms so that would be a perfect update, I will follow the future developments :)

nilshoffmann commented 2 weeks ago

Current discussions & issues on mzTab-M 2.1.0 are being tracked over here: https://github.com/HUPO-PSI/mzTab/issues