metaspace2020 / metaspace

Cloud engine and platform for metabolite annotation for imaging mass spectrometry
https://metaspace2020.eu/
Apache License 2.0
45 stars 10 forks source link

Warn users if they submit a dataset with the wrong polarity #554

Open LachlanStuart opened 4 years ago

LachlanStuart commented 4 years ago

Polarity information is usually present in imzML files, either in the <referenceableParamGroupList>, e.g.

<mzML xmlns="http://psi.hupo.org/ms/mzml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://psi.hupo.org/ms/mzml http://psidev.info/files/ms/mzML/xsd/mzML1.1.0_idx.xsd" version="1.1">
  <referenceableParamGroupList count="4">
    <referenceableParamGroup id="spectrum1">
      <cvParam cvRef="MS" accession="MS:1000129" name="negative scan" value=""/>

or per-spectrum, e.g.

<?xml version="1.0" encoding="ISO-8859-1"?>
<mzML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://psi.hupo.org/ms/mzml http://psidev.info/files/ms/mzML/xsd/mzML1.1.0.xsd" xmlns="http://psi.hupo.org/ms/mzml" version="1.1.0" id="28052019_MZ_1st_spotting_DHB_mz60-360_pos_pix 220X220">
  <run defaultInstrumentConfigurationRef="IC1" defaultSourceFileRef="RAW1" id="_x0032_8052019_MZ_1st_spotting_DHB_mz60-360_pos_pix_x0020_220X220" startTimeStamp="Tue May 28 11:04:13 CEST 2019">
    <spectrumList count="10956" defaultDataProcessingRef="pwiz_Reader_Thermo_conversion">
      <spectrum defaultArrayLength="3872" id="controllerType=0 controllerNumber=1 scan=1" index="0" dataProcessingRef="pwiz_Reader_Thermo_conversion" spotID=",,x">
        <cvParam cvRef="MS" accession="MS:1000130" name="positive scan" value=""/>

If the polarity is present and conflicts with the user's selected polarity, warn the user.

Because we don't already have any real-time imzML validation, the easiest way to warn them is probably to send an email during or after annotation.

LachlanStuart commented 3 years ago

@sergii-mamedov To help you get started on this: we don't currently check this metadata at all in METASPACE so you'll need to look at the interface for pyimzml. It splits the metadata into two parts: ImzMLParser.metadata for the global metadata, and ImzMLParser.spectrum_metadata_fields for per-spectrum metadata. For the per-spectrum metadata you need to specify which accession numbers to read when constructing the ImzMLParser (example from the tests).

There are lists mapping accession numbers to field names in these files: ims.py and ms.py

sergii-mamedov commented 3 years ago

The analysis showed that there are just over a hundred datasets that fit this task. We decided that it is necessary:

  1. After processing the dataset, check the polarity correspondence between imzml file and metadata.
  2. If the polarity is different - send an e-mail to the owner of the dataset with a message about it.