Open momo-0521 opened 1 week ago
Hi,
Please try: df<-read_parquet("report.parquet") length(unique(df$Protein.Group[df$Lib.Q.Value <= 0.01 & df$Lib.PG.Q.Value <= 0.01 & df$PG.Q.Value <= 0.05]))
Best, Vadim
Thank you for your advice。
I have tried this, but it does not work.It affected the number of precursors but had no effect on the entries in Protein.Group.
df<-read_parquet("report.parquet") length(unique(df$Protein.Group[df$Lib.Q.Value <= 0.01 & df$Lib.PG.Q.Value <= 0.01 & df$PG.Q.Value <= 0.05])) [1] 14126
Thank you again! T
Is this MBR output?
Yes, it is MBR output.
Can you please share both the .parquet and pg_matrix? A quick check: do the timestamps (date modified) on those files match?
Best, Vadim
Thank you! Please find the file in Google Cloud. https://drive.google.com/file/d/1TAU2fQ1pnf4PXOqAlVVFMu4zM3Vg4L-Q/view?usp=sharing https://drive.google.com/file/d/1jd-vLFXjsfTy4_dgd-ztEwd8RzqsXoD_/view?usp=sharing
length(unique(df$Protein.Group[df$Lib.PG.Q.Value <= 0.01 & df$PG.Q.Value <= 0.05 & df$PG.MaxLFQ > 0])) [1] 13121
Works if filter for non-zero quantities too :)
Thank you very much for your great help.
Best wishes!
Hi, Vadim
Thanks for your help yesterday. I have encountered a new question. When I utilized ‘diann_maxlfq’ to estimate protein group quantities, the results appear to differ significantly from those obtained from 'pg_matrix' as well as the 'PG.MaxLFQ' column. Below is the code I employed, which functioned correctly in DIANN 1.8 but has raised some concerns in DIANN 1.9. Do you have any suggestions or advice on this issue? protein.groups <- diann_maxlfq(df[df$Lib.PG.Q.Value <= 0.01 & df$PG.Q.Value <= 0.05 & df$PG.MaxLFQ > 0,], sample.header = "Run", group.header="Protein.Group", id.header = "Precursor.Id", quantity.header = "Precursor.Normalised")
Thank you in advance!
diann_maxlfq implements a simple MaxLFQ algorithm, different from what DIA-NN uses internally. The results will therefore always differ.
Thank you. I understand.
Another question is about species-specifc precursors. Our samples contain a mixture of human and mouse proteins. When running DIANN 1.9, we used both human and mouse FASTA files and add additional options including '--species-genes' and '--species-ids'. We would like to exclude precursors specific to mouse or shared between both species, and instead focus only on human-specific precursors to quantify their associated proteins. Under these parameter settings, we would like to know if the 'PG.MaxLFQ' value is calculated from human-specific and mouse-specific precursors?
Best wishes!
It's calculated using all precursors matched to the protein group (Protein.Group column). So in this case you'd want to just discard all entries in the .parquet report with Protein.Ids column string containing 'MOUSE'.
Hi Vadim
Thanks for your work in DiaNN 1.9. When analyzing the results from version 1.9, I've observed discrepancies between the number of Protein.Group entries filtered by R and those reported in report.pg_matrix. Are there additional filtering steps being applied? I suspect that the "Additional 5% run-specific protein-level FDR filter applied to the protein matrices, use --matrix-spec-q to adjust it" might be impacting the results. However, I'm unsure how to address this issue.
Thank you in advance