Open pisistrato opened 1 year ago
Hi, I think the difference might be due to the modified peptides. For the data in #7, if I remove all the modifications, I get the same result as the previous script.
process_long_format("DIA-Report-long-format.txt",
output_filename = "iq-MaxLFQ.tsv",
sample_id = "R.FileName",
primary_id = "PG.ProteinGroups",
secondary_id = c("EG.Library", "FG.Id", "FG.Charge", "F.FrgIon", "F.Charge", "F.FrgLossType"),
intensity_col = "F.PeakArea",
annotation_col = c("PG.Genes", "PG.ProteinNames", "PG.FastaFiles"),
filter_string_equal = c("F.ExcludedFromQuantification" = "False"),
filter_double_less = c("PG.Qvalue" = "0.01", "EG.Qvalue" = "0.01"),
peptide_extractor = function(x) gsub("\\[Oxidation \\(M\\)\\]", "",
gsub("\\[Carbamidomethyl \\(C\\)\\]", "",
gsub("\\[Acetyl \\(Protein N-term\\)\\]", "",
gsub("_.[0-9].*$", "", x)))),
log2_intensity_cutoff = -1000)
The regex in peptide_extractor
is a bit messy. Basically, we want entries such as "[Acetyl (Protein N-term)]M[Oxidation (M)]EDMNEYSNIEEFAEGSK_.2" be reduced to "MEDMNEYSNIEEFAEGSK" to count the number of unique peptides.
Hi Thang,
Here #7 you have proposed a way to calculate the number of peptides used for quantification to filter the final MaxLFQ table.
I was wondering, how is this different from the option
peptide_extractor
in theprocess_long_format
function? I am getting slightly different number of proteins using the two approaches (DIA-NN output).