nf-core / quantms

Quantitative mass spectrometry workflow. Currently supports proteomics experiments with complex experimental designs for DDA-LFQ, DDA-Isobaric and DIA-LFQ quantification.
https://nf-co.re/quantms
MIT License
31 stars 0 forks source link

[Discussion] modification specification propagation #104

Closed jspaezp closed 7 months ago

jspaezp commented 1 year ago

Hello! I was hoping we could discuss why the lookup of a modification is using the 'NT' name when an unimod accession is available. It sincerely feels like the human readable name should be a fallback or at least checked for consistency when validating the input SDRF.

LMK what you think!

Mod specified as:

NT=Carbamidomethylation; MT=Fixed; TA=C; AC=Unimod:4 I am assuming right now that it should have been specified as its PSI name: NT=Carbamidomethyl; MT=Fixed; TA=C; AC=Unimod:4 BUT I believe it should not be used anyway, since the unimod accession is available.

(Should this value be validated in sdrf_pipelines ?)

LMK if we could improve the documentation on the pipeline to know what fields/sub-fields actually matter for the execution.

Error trace at the NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT stage:

Error executing process > 'NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (input_files.sdrf)'

Caused by:
  Essential container in task exited

Command executed:

  diann_convert.py convert \
      --folder ./ \
      --exp_design input_files.sdrf_openms_design.tsv \
      --diann_version ./version/versions.yml \
      --dia_params "0.02;Da;10;ppm;Trypsin;Carbamidomethylation (C);Oxidation (M)" \
      --charge 4 \
      --missed_cleavages 2 \
      --qvalue_threshold 0.01 \
      2>&1 | tee convert_report.log

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT":
      pyopenms: $(pip show pyopenms | grep "Version" | awk -F ': ' '{print $2}')
  END_VERSIONS

Command exit status:
  1

line 422, in MTD_mod_info
      mod_obj = mods_db.getModification(mod)
    File "pyopenms/pyopenms_2.pyx", line 9057, in pyopenms.pyopenms_2.ModificationsDB.getModification
    File "pyopenms/pyopenms_2.pyx", line 9016, in pyopenms.pyopenms_2.ModificationsDB._getModification_1
  RuntimeError: the value 'Carbamidomethylation (C)' was used but is not valid; Retrieving the modification failed. It is not available for the residue '' and term specificity 'none'.

const OpenMS::ResidueModification* OpenMS::ModificationsDB::searchModificationsFast(const OpenMS::String&, bool&, const OpenMS::String&, OpenMS::ResidueModification::TermSpecificity) constModification not found: Carbamidomethylation (C)
  Warning: OPENMS_DATA_PATH environment variable already exists. pyOpenMS will use it (/usr/local/share/OpenMS/) to locate data in the OpenMS share folder (e.g., the unimod database), instead of the default (/usr/local/lib/python3.10/site-packages/pyopenms/share/OpenMS).
jpfeuffer commented 1 year ago

Sorry for the late reply! I agree: Whenever we can use the unimod identifier/number, we should use it first. The problem is that is might get a bit tricky because you probably need to support both use cases. Because it can happen that a unimod identifier was not specified or the tool cannot handle it. So, do you want to pass through both unimod identifier and name instead? I think for example the OpenMS tools in the non-DIA branches require the human readable name instead of the unimod identifier because (outside of this pipeline) you could add your own custom modifications.

jspaezp commented 1 year ago

Hello there!

After giving it some more thought I understand the requirement to support those workflows (although I do not understand from the docs how one could specify a mod just by mass/name).

Nonetheless, at the very least I think the error message should be better: since the current error 1. Disregards the mod location that was correctly specified. 2. Does not say where it is attempting to retrieve it from. 3. does not suggest a solution

# input was NT=Carbamidomethylation; MT=Fixed; TA=C; AC=Unimod:4
  RuntimeError: the value 'Carbamidomethylation (C)' was used but is not valid; Retrieving the modification failed. It is not available for the residue '' and term specificity 'none'.

therefore I think the error message should be something like

The name `Carbamidomethylation` does not match with any `PSI-mod name`, please provide a name that does.

# OR my preference, since it would prevent unexpected behaviors like `NT=Phospho; MT=Fixed; TA=C; AC=Unimod:4` (accession and modification site for carbamido, but name set as phospho)

The name `Carbamidomethylation` does not match with the expected name for the provided accession `UNIMOD:4` (Expected: Carbamidomethyl)

Best!

ypriverol commented 7 months ago

This has been done.