workflow4metabolomics / mtbls-dwnld

4 stars 2 forks source link

Improve conversion to W4M format in case of multiple measures #2

Closed pkrog closed 7 years ago

pkrog commented 7 years ago

In case of multiple choices inside the Metabolights study, find a way to let the user choose between them, or output all of them.

sneumann commented 7 years ago

Indeed, running a workflow with multiple assays completely fails:

Hi, for MTBLS338 I get

2017-01-27T12:37:24.129081966Z Error in read.table(argVc["dataMatrix_in"], check.names = FALSE, header = TRUE,  : 
2017-01-27T12:37:24.129135409Z   no lines available in input
2017-01-27T12:37:24.129147410Z Calls: t -> as.matrix -> read.table
2017-01-27T12:37:24.129151256Z Execution halted

The HTML of the downloader does not contain m_*.* files,

 Metabolights study
Investigation file
Study files:
s_MTBLS338.txt
Assay files:
a_MTBLS338_GCMS_Root_met.txt
a_MTBLS338_Root_metabolite_profiling_mass_spectrometry_targeted.txt
a_MTBLS338_negative_Root_met_ms.txt
a_MTBLS338_positive_Root_met_ms.txt

although MTBLS338 has four of them, see ftp://ftp.ebi.ac.uk/pub/databases/metabolights/studies/public/MTBLS338

[FILE] m_MTBLS338_GCMS_Root_met_v2_maf> Oct 04 15:34    305K [VIEW] [DOWNLOAD]
[FILE] m_MTBLS338_Root_metabolite_prof> Oct 04 15:34    823K [VIEW] [DOWNLOAD]
[FILE] m_MTBLS338_negative_Root_met_ms> Oct 04 15:34   5442K [VIEW] [DOWNLOAD]
[FILE] m_MTBLS338_positive_Root_met_ms> Oct 04 15:34   6635K [VIEW] [DOWNLOAD]
pkrog commented 7 years ago

Hi @sneumann , I'm currently looking at MTBLS338, and improving my script isatab2w4m. For two of the assays (a_MTBLS338_GCMS_Root_met.txt and a_MTBLS338_Root_metabolite_profiling_mass_spectrometry_targeted.txt) it works well.

However for the other two, it fails. The reason is that the names in the column MS Assay Name do not match the column names in the matching m_* file. In fact the names in column MS Assay Name of file a_MTBLS338_negative_Root_met_ms.txt seems to match the column names of the positive file m_MTBLS338_positive_Root_met_ms_v2_maf.tsv and vice versa.

If I guess correctly, this seems to be a mistake. Do you know how to make it corrected?

sneumann commented 7 years ago

Thanks for reporting. We had some discussions over MTBLS338 with mtbls-curators, since intensities were collected from either the positive or negative mode assay into the m* file, depending on what polarity works best for this metabolite. So in fact the MAF column names should reflect the s* sample names, instead of assay names. If instead they match the MS assay names of the opposite polarity, that'd be a mistake.

pkrog commented 7 years ago

This feature has been developed. Now tool output selected assay and also all assays in W4M format. It can also output mzData and mzML files as collections.