tvpham / msproteomics

Apache License 2.0
2 stars 0 forks source link

read_diann not processing #3

Open dh2305 opened 1 week ago

dh2305 commented 1 week ago

Hi I installed the package inside my miniconda and I am working in the (ms proteomics) environment within the cd of my report.tsv from DIA-NN.

However (msproteomics) (msproteomics) PS D:\Data\Dominik\27102024DIANNH1299secondtry\reseponlyfasta> read_diann -o reportmsproteomics.tsv -E Fragment.Quant.Raw reportmsproteomics.tsv -f human.fasta report.tsv

is not starting the processing the desired input table for sitereport. It is just showing me this:

Activate siteloc (use -h for help) ...

and not showing process nor producing the output table

thanks for your input!

tvpham commented 1 week ago

Hi, it seems that your command has the same input and output. Can you change the output filename (for example, -o reportmsproteomics_output.tsv)? I assume that your diann report is reportmsproteomics.tsv.

Thang

dh2305 commented 1 week ago

Hi Thang,

I pasted the command wrongly.

(msproteomics) (msproteomics) PS D:\Data\Dominik\27102024DIANNH1299secondtry\reseponlyfasta> read_diann -o reportmsproteomics.tsv -E Fragment.Quant.Raw -f human.fasta report.tsv

Report.tsv is my diann report. However the command does not do anything

dh2305 commented 1 week ago

Update

After installing many additional packages and environments on this windows system it worked

1)Is the code and response listed below correct and fine? 2) How do I adjust the site confidence cutoff? For DIANN I wanna replicate their 90 and 99 site table but with 75 and 51 and 1% values instead? 2)Are the maxlfq values in the txt output files for site and phosphopeptide log2 transformed? 3)What do the sites and allsites column in the txt mean? What is M1/M2 etc? 4)Why are peptides in the phosphopeptide report included that have a STY_count of 0? Why are some as 0.0 and some as 0.1 labeled? 5)How do I continue with this data inside perseus for example like I am used to with MaxQuant site output?

(msproteomics) C:\Users\domin>cd C:\Users\domin\Downloads\msproteomics-1.0.1\msproteomics-1.0.1

(msproteomics) C:\Users\domin\Downloads\msproteomics-1.0.1\msproteomics-1.0.1>read_diann -o reportmsproteomics.tsv -E Fragment.Quant.Raw -f human.fasta report.tsv Activate siteloc (use -h for help) ... Loading FASTA files: 1) Processing human.fasta Index of J not found in char_tree_getindex. Failed determining child index in char_tree_enter. Failed processing a peptide in store_fasta_into_peptide_tree. Inserted 20467 proteins into peptide tree. Failed storing FASTA file in peptide tree.

Failed loading FASTA files. Processing input file report.tsv. Skipped 10 lines with peptide backbone sequences not found in FASTA files. Wrote output to file reportmsproteomics.tsv.

(msproteomics) C:\Users\domin\Downloads\msproteomics-1.0.1\msproteomics-1.0.1>sitereport reportmsproteomics.tsv

General setting: input data = reportmsproteomics.tsv processing tool = generic sample_id_col = Run intensity_col = Fragment.Intensity secondary_id_cols = ['Precursor.Id', 'Fragment.Rel.Id'] annotation_cols = ['Fasta.Files'] normalize = none quant_method = maxlfq

Phosphosite setting: protein_id_col = Protein.Ids site_id_col = Phospho.Site.Specs site_filter_double_less = [['Global.Q.Value', '0.01']] site_filter_double_greater = [['PTM.Site.Confidence', '0.01']] site_filter_string_equal = None site_filter_string_not_equal = None output_site = phosphosite-report.txt

Phosphopeptide setting: modified_sequence_col = Modified.Sequence regex_str = (UniMod:[0-9]*) target_modification = (UniMod:21) peptide_filter_double_less = [['Global.Q.Value', '0.01']] peptide_filter_string_not_equal = None output_peptide = phosphopeptide-report.txt

Loading data file: reportmsproteomics.tsv 625219 rows x 10 columns read 587458 rows after filtering out <= 0 intensities. 418042 rows after other filtering(s)

Creating site identifiers Creating site report Concatenating secondary ids...

574816 rows after peptide filtering Creating peptide identifiers Creating peptide report Concatenating secondary ids...

tvpham commented 6 days ago

It looks ok.

The first step, the siteloc program is quite verbal. I normally do not get so many warnings, like 'Index of J not found in char_tree_getindex.'.

dh2305 commented 6 days ago

why is this happening? I am using a regular human swissprot fasta with contaminants added?

also: 2) How do I adjust the site confidence cutoff? For DIANN I wanna replicate their 90 and 99 site table but with 75 and 51 and 1% values instead? 2)Are the maxlfq values in the txt output files for site and phosphopeptide log2 transformed? I think so based on the supplement of your biorxv publication but I need confirmation. 3)What do the sites and allsites column in the txt mean? What is M1/M2 etc? -> This I have now identified as multiplicity. Is there anyway to collapse this data back to without multiplicity like DIA-NN or PTMprophet does? 4)Why are peptides in the phosphopeptide report included that have a STY_count of 0? Why are some as 0.0 and some as 0.1 labeled?