wombat-p / WOMBAT-Pipelines

MIT License
3 stars 4 forks source link

Make tools for statistical tests exchangeable #17

Open veitveit opened 3 months ago

veitveit commented 3 months ago

Description of feature

Instead of attached on specific statistical test to a workflow, let them run on any the generalized stand_pep/stand_prot files, then providing an updated version of these files containing the p-values and FDRs, as well as the log-ratios

Questions:

Current state:
https://github.com/wombat-p/WOMBAT-Pipelines/tree/mutli_stat_tests

Rough working plan

@veitveit @wraff

veitveit commented 3 months ago

For getting started with the convertors, this is my suggestion for the standardized output format, meant for input for the statistical tests.

Experimental design
The experimental design file will already exist and is of the format given in the README of https://github.com/wombat-p/WOMBAT-Pipelines
The column "exp_condition" in this file is crucial as defining the columns names in the standardized format.

Sample nomenclature:
For each of the files, fractions will be summarized in to "samples". Then this will provided by the name in "expcondition" from the experimental design file plus "" and the number of the biological/technical replicate. "INFOTYPE_EXPCOND_BIOREP". For example: "number_of_peptides_100.amol_3"

Protein level file stand_prot_quant.csv The file should contain the following columns:

Peptide level file stand_pep_quant.csv
The file should contain the following columns:

Ion level file stand_ion_quant.csv (optional and more for being able to send the output to ProteoBench):
Same as peptide level file, but with charge states separated to represent the peaks in the chromatogram

veitveit commented 3 months ago

examples_PXD011153.zip And here are the example files for FlashLFQ and the TPP output generated with my own scripts. The TPP output seems to be mostly complete although missing a good way to deal with the modifications.

veitveit commented 2 months ago

@wraff Sorry, I think we need a small correction for the column names of e.g. "abundance_", as to include the technical replicates: "INFOTYPE_EXPCOND_BIOREP_TECHREP".