Closed anubhav0fnu closed 3 years ago
@anubhav0fnu can this issue be closed or would you like it to be moved to the August sprint?
Moved to August sprint per Slack message from @anubhav0fnu
@scanon , @hubin-keio , @Michal-Babins Following up on the Aug 23rd meeting.
Question: what're are the input and outputs to the [shell script's each command] (https://github.com/microbiomedata/metaPro/blob/master/run_tasks.sh)
?
Answer:
Processing only for stegen/500088
(test dataset).
INPUT:
.
├── data
│ └── set_of_Dataset_IDs
│ └── stegen
│ └── 500088
│ └── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39.raw
├── fastas
│ └── stegen
│ └── 1781_100336
│ ├── Ga0482236_functional_annotation.gff
│ └── Ga0482236_proteins.faa
├── mappings
│ └── EMSL48473_JGI1781_Stegen_DatasetToMetagenomeMapping_2021-01-25.xlsx
└── parameters
├── LTQ-FT_10ppm_2014-08-06.xml
├── MSGFPlus_PartTryp_MetOx_20ppmParTol.txt
├── MSGFPlus_PartTryp_MetOx_20ppmParTol_ModDefs.txt
├── MSGFPlus_Tryp_NoMods_20ppmParTol.txt
├── Mass_Correction_Tags.txt
└── Tryp_Pig_Bov.fasta
docker exec -it analysisJobContainer python3.8 ./metaPro/src/prepare_input/emsl_to_jgi.py
OUTPUT: emsl_to_jgi.json
INPUT: emsl_to_jgi.json
docker exec -it analysisJobContainer python3.8 ./metaPro/src/analysis_jobs/run_analysis_job.py
OUTPUT:
.
└── 1781_100336
├── analysis_jobs_logs
│ ├── 0_masic.commandlog
│ ├── 0_masic.log
│ ├── 1_MSconvert.log
│ ├── 2_MSGFPlus.log
│ ├── 3_MzidToTsvConverter.log
│ ├── 4_TsvToSynConverter.commandlog
│ ├── 4_TsvToSynConverter.log
│ └── ProteinDigestionSimulator.log
├── merged_jobs
│ └── 500088_1781_100336_MSGFjobs_MASIC_resultant.tsv
├── msgfplus_input
│ └── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39.mzML
├── msgfplus_output
│ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39.mzid
│ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39.tsv
│ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_PepToProtMapMTS.txt
│ └── fasta_residuals
│ ├── Ga0482236_proteins.faa
│ ├── Ga0482236_proteins.revCat.canno
│ ├── Ga0482236_proteins.revCat.cnlcp
│ ├── Ga0482236_proteins.revCat.csarr
│ ├── Ga0482236_proteins.revCat.cseq
│ └── Ga0482236_proteins.revCat.fasta
├── nmdc_jobs
│ ├── SIC
│ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_DatasetInfo.xml
│ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_MSMS_scans.csv
│ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_MS_scans.csv
│ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_SICs.xml
│ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_SICstats.txt
│ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_ScanStats.txt
│ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_ScanStatsConstant.txt
│ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_ScanStatsEx.txt
│ │ └── index.html
│ └── SYNOPSIS
│ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_fht.txt
│ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn.txt
│ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_ModDetails.txt
│ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_ModSummary.txt
│ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_ProteinMods.txt
│ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_ResultToSeqMap.txt
│ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_SeqInfo.txt
│ └── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_SeqToProteinMap.txt
├── protein_digestion
│ └── Ga0482236_proteins.txt
and
modified emsl_to_jgi.json
INPUT: modified emsl_to_jgi.json
docker exec -it postProcessingContainer python ./metaPro/src/post_processing/run_fa.py
OUTPUT:
└── reports
├── 500088_1781_100336_Peptide_Report.tsv
├── 500088_1781_100336_Protein_Report.tsv
└── 500088_1781_100336_QC_metrics.tsv
and
modified emsl_to_jgi.json
INPUT: modified emsl_to_jgi.json
docker exec -it postProcessingContainer python ./metaPro/src/metadata_collection/gen_meta_data.py
OUTPUT: modified emsl_to_jgi.json
├── stegen_MetaProteomicAnalysis_activity.json
└── stegen_emsl_analysis_data_objects.json
and
modified emsl_to_jgi.json
FYI, @scanon & @hubin-keio, @Michal-Babins ran the workflow on July 27th and has both the results and data for that particular dataset.
just FYIing, @pdpiehowski , @SamuelPurvine.
Hello, Anubhav,
Thanks for the update. The logic of each script is still hard to follow. Can you start with an expanded legend of the diagram illustrating the metaP workflow? For example, what is the output from MASIC and MSGF+? Among the output files, which is used for peak areas detection, and what is the result file of this step? Thanks.
Regards, Bin
[cid:681764BC-6DCD-475F-8C65-4F898A4F49D3]
On Aug 24, 2021, at 1:52 PM, Anubhav @.**@.>> wrote:
@scanonhttps://github.com/scanon , @hubin-keiohttps://github.com/hubin-keio , @Michal-Babinshttps://github.com/Michal-Babins Following up on the Aug 23rd meeting.
Question: what're are the input and outputs to the [shell script's each command] (https://github.com/microbiomedata/metaPro/blob/master/run_tasks.sh)?
Answer:
Processing only for stegen/500088 (test dataset).
INPUT:
. ├── data │ └── set_of_Dataset_IDs │ └── stegen │ └── 500088 │ └── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39.raw ├── fastas │ └── stegen │ └── 1781_100336 │ ├── Ga0482236_functional_annotation.gff │ └── Ga0482236_proteins.faa ├── mappings │ └── EMSL48473_JGI1781_Stegen_DatasetToMetagenomeMapping_2021-01-25.xlsx └── parameters ├── LTQ-FT_10ppm_2014-08-06.xml ├── MSGFPlus_PartTryp_MetOx_20ppmParTol.txt ├── MSGFPlus_PartTryp_MetOx_20ppmParTol_ModDefs.txt ├── MSGFPlus_Tryp_NoMods_20ppmParTol.txt ├── Mass_Correction_Tags.txt └── Tryp_Pig_Bov.fasta
docker exec -it analysisJobContainer python3.8 ./metaPro/src/prepare_input/emsl_to_jgi.py
OUTPUT: emsl_to_jgi.json
INPUT: emsl_to_jgi.json
docker exec -it analysisJobContainer python3.8 ./metaPro/src/analysis_jobs/run_analysis_job.py
OUTPUT:
. └── 1781_100336 ├── analysis_jobs_logs │ ├── 0_masic.commandlog │ ├── 0_masic.log │ ├── 1_MSconvert.log │ ├── 2_MSGFPlus.log │ ├── 3_MzidToTsvConverter.log │ ├── 4_TsvToSynConverter.commandlog │ ├── 4_TsvToSynConverter.log │ └── ProteinDigestionSimulator.log ├── merged_jobs │ └── 500088_1781_100336_MSGFjobs_MASIC_resultant.tsv ├── msgfplus_input │ └── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39.mzML ├── msgfplus_output │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39.mzid │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39.tsv │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_PepToProtMapMTS.txt │ └── fasta_residuals │ ├── Ga0482236_proteins.faa │ ├── Ga0482236_proteins.revCat.canno │ ├── Ga0482236_proteins.revCat.cnlcp │ ├── Ga0482236_proteins.revCat.csarr │ ├── Ga0482236_proteins.revCat.cseq │ └── Ga0482236_proteins.revCat.fasta ├── nmdc_jobs │ ├── SIC │ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_DatasetInfo.xml │ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_MSMS_scans.csv │ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_MS_scans.csv │ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_SICs.xml │ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_SICstats.txt │ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_ScanStats.txt │ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_ScanStatsConstant.txt │ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_ScanStatsEx.txt │ │ └── index.html │ └── SYNOPSIS │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_fht.txt │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn.txt │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_ModDetails.txt │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_ModSummary.txt │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_ProteinMods.txt │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_ResultToSeqMap.txt │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_SeqInfo.txt │ └── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_SeqToProteinMap.txt ├── protein_digestion │ └── Ga0482236_proteins.txt
and
modified emsl_to_jgi.json
INPUT: modified emsl_to_jgi.json
docker exec -it postProcessingContainer python ./metaPro/src/post_processing/run_fa.py
OUTPUT:
└── reports
├── 500088_1781_100336_Peptide_Report.tsv
├── 500088_1781_100336_Protein_Report.tsv
└── 500088_1781_100336_QC_metrics.tsv
and
modified emsl_to_jgi.json
INPUT: modified emsl_to_jgi.json
docker exec -it postProcessingContainer python ./metaPro/src/metadata_collection/gen_meta_data.py
OUTPUT: modified emsl_to_jgi.json
├── stegen_MetaProteomicAnalysis_activity.json └── stegen_emsl_analysis_data_objects.json
and
modified emsl_to_jgi.json
FYI, @scanonhttps://github.com/scanon & @hubin-keiohttps://github.com/hubin-keio, @Michal-Babinshttps://github.com/Michal-Babins ran the workflow on July 27th and has both the results and data for that particular dataset.
just FYIing, @pdpiehowskihttps://github.com/pdpiehowski , @SamuelPurvinehttps://github.com/SamuelPurvine.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/microbiomedata/metaPro/issues/9#issuecomment-904929375, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB6YE7QIVFA7XQNF5AD4KYTT6P2BXANCNFSM5AHWEZBA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.
Hello @hubin-keio
I think your requests are out of the scope of work. If you need further assistance, please do contact task leads. The codebase is open-source & you're free to spend time with the codebase and learn parts of it by yourself. I can't dedicate my time to educate more about the workflow other than what's provided in the project directory.
Additionally, this issue is created by me & I assigned it to myself & I'm working on it. I'm not aware of any assistance requested from our team's side to you, if needed we'll connect as per the guidelines defined under this collaborative project."
Hello, Anubhav, Thanks for the update. The logic of each script is still hard to follow. Can you start with an expanded legend of the diagram illustrating the metaP workflow? For example, what is the output from MASIC and MSGF+? Among the output files, which is used for peak areas detection, and what is the result file of this step? Thanks. Regards, Bin [cid:681764BC-6DCD-475F-8C65-4F898A4F49D3] On Aug 24, 2021, at 1:52 PM, Anubhav @.**@.>> wrote: @scanonhttps://github.com/scanon , @hubin-keiohttps://github.com/hubin-keio , @Michal-Babinshttps://github.com/Michal-Babins Following up on the Aug 23rd meeting. Question: what're are the input and outputs to the [shell script's each command] (https://github.com/microbiomedata/metaPro/blob/master/run_tasks.sh)? Answer: Processing only for stegen/500088 (test dataset). … ____ INPUT: . ├── data │ └── set_of_Dataset_IDs │ └── stegen │ └── 500088 │ └── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39.raw ├── fastas │ └── stegen │ └── 1781_100336 │ ├── Ga0482236_functional_annotation.gff │ └── Ga0482236_proteins.faa ├── mappings │ └── EMSL48473_JGI1781_Stegen_DatasetToMetagenomeMapping_2021-01-25.xlsx └── parameters ├── LTQ-FT_10ppm_2014-08-06.xml ├── MSGFPlus_PartTryp_MetOx_20ppmParTol.txt ├── MSGFPlus_PartTryp_MetOx_20ppmParTol_ModDefs.txt ├── MSGFPlus_Tryp_NoMods_20ppmParTol.txt ├── Mass_Correction_Tags.txt └── Tryp_Pig_Bov.fasta docker exec -it analysisJobContainer python3.8 ./metaPro/src/prepare_input/emsl_to_jgi.py OUTPUT: emsl_to_jgi.json ____ INPUT: emsl_to_jgi.json docker exec -it analysisJobContainer python3.8 ./metaPro/src/analysis_jobs/run_analysis_job.py OUTPUT: . └── 1781_100336 ├── analysis_jobs_logs │ ├── 0_masic.commandlog │ ├── 0_masic.log │ ├── 1_MSconvert.log │ ├── 2_MSGFPlus.log │ ├── 3_MzidToTsvConverter.log │ ├── 4_TsvToSynConverter.commandlog │ ├── 4_TsvToSynConverter.log │ └── ProteinDigestionSimulator.log ├── merged_jobs │ └── 500088_1781_100336_MSGFjobs_MASIC_resultant.tsv ├── msgfplus_input │ └── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39.mzML ├── msgfplus_output │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39.mzid │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39.tsv │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_PepToProtMapMTS.txt │ └── fasta_residuals │ ├── Ga0482236_proteins.faa │ ├── Ga0482236_proteins.revCat.canno │ ├── Ga0482236_proteins.revCat.cnlcp │ ├── Ga0482236_proteins.revCat.csarr │ ├── Ga0482236_proteins.revCat.cseq │ └── Ga0482236_proteins.revCat.fasta ├── nmdc_jobs │ ├── SIC │ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_DatasetInfo.xml │ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_MSMS_scans.csv │ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_MS_scans.csv │ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_SICs.xml │ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_SICstats.txt │ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_ScanStats.txt │ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_ScanStatsConstant.txt │ │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_ScanStatsEx.txt │ │ └── index.html │ └── SYNOPSIS │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_fht.txt │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn.txt │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_ModDetails.txt │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_ModSummary.txt │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_ProteinMods.txt │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_ResultToSeqMap.txt │ ├── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_SeqInfo.txt │ └── Froze_Core_2015_N2_50_60_34_QE_26May16_Pippin_16-03-39_syn_SeqToProteinMap.txt ├── protein_digestion │ └── Ga0482236_proteins.txt and modified emsl_to_jgi.json ____ INPUT: modified emsl_to_jgi.json docker exec -it postProcessingContainer python ./metaPro/src/post_processing/run_fa.py OUTPUT: └── reports ├── 500088_1781_100336_Peptide_Report.tsv ├── 500088_1781_100336_Protein_Report.tsv └── 500088_1781_100336_QC_metrics.tsv and modified emsl_to_jgi.json ____ INPUT: modified emsl_to_jgi.json docker exec -it postProcessingContainer python ./metaPro/src/metadata_collection/gen_meta_data.py OUTPUT: modified emsl_to_jgi.json ├── stegen_MetaProteomicAnalysis_activity.json └── stegen_emsl_analysis_data_objects.json and modified emsl_to_jgi.json ____ FYI, @scanonhttps://github.com/scanon & @hubin-keiohttps://github.com/hubin-keio, @Michal-Babinshttps://github.com/Michal-Babins ran the workflow on July 27th and has both the results and data for that particular dataset. just FYIing, @pdpiehowskihttps://github.com/pdpiehowski , @SamuelPurvinehttps://github.com/SamuelPurvine. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#9 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB6YE7QIVFA7XQNF5AD4KYTT6P2BXANCNFSM5AHWEZBA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.
Moving this out of this sprint per Slack message from Anubhav. It's now labeled with 'backlog' and will pull into an appropriate sprint in the future.
Moving to in progress per @anubhav0fnu who said it would be closed soon
@ssarrafan, @scanon I rolled out the metaPro WDL.
@ssarrafan, @scanon I rolled out the metaPro WDL.
Thanks @anubhav0fnu I will close this one but if you need a new issue related for November let me know
A metaP workflow written in WDL is needed.