Closed fcyu closed 1 year ago
Hi Fengchao,
Thank you for your kind words, and for the complete script & data.
Actually, one would need to remove only the spaces in the column names. It is unfortunate that we use spaces internally to pass the variables. I will change that in the future. But it will take some time because such changes require quite a bit of testing.
I could not finish your script though. The "EG.ModifiedSequence" is in both primary_id and secondary_id ?
Also, 'na_string' accepts a single string only. I will put it on the todo list to accept multiple values.
Best, Thang
Hi Thang,
Thank you very much for the prompt response.
I could not finish your script though. The "EG.ModifiedSequence" is in both primary_id and secondary_id ?
To be honest, I'm not entirely sure I understand the primary_id
and secondary_id
. As far as I know, the primary_id
is the "row id" in the final intensity matrix, and the secondary_id
indicates the units to calculate the intensity for the primary_id
. Is that correct? So, in the above script, I wanted to calculate the intensity of modified sequences using the precursors (modified sequence + charge).
Also, 'na_string' accepts a single string only. I will put it on the todo list to accept multiple values.
Thank you for pointing it out. Some more detailed description in the document would be much appreciated.
Best,
Fengchao
Hi Fengchao,
I've updated the package. The new version v1.9.9 support spaces and most other characters in the column names. So your statement should work: ` df <- fast_read(path, sample_id = "R.FileName", primary_id = "EG.ModifiedSequence",
secondary_id = c("FG.Charge"),
intensity_col = "EG.TotalQuantity (Settings)",
annotation_col = NULL,
filter_string_equal = NULL,
filter_string_not_equal = NULL,
filter_double_less = c("PG.Qvalue" = 0.01, "PG.QValue (Run-Wise)" = 0.01, "EG.Qvalue" = 0.01),
filter_double_greater = NULL,
intensity_col_sep = NULL,
intensity_col_id = NULL,
na_string = "Filtered")
`
To be honest, I'm not entirely sure I understand the
primary_id
andsecondary_id
. As far as I know, theprimary_id
is the "row id" in the final intensity matrix, and thesecondary_id
indicates the units to calculate the intensity for theprimary_id
. Is that correct? So, in the above script, I wanted to calculate the intensity of modified sequences using the precursors (modified sequence + charge).
Yes, primary_id
is the output row_id. The secondary_id
are entries contributing to the row_id (I think you got it correct also. It is just a concept that hard to explain very clearly). So if you want to collapse multiple charge states, you can just say secondary_id = c("FG.Charge")
.
Also, 'na_string' accepts a single string only. I will put it on the todo list to accept multiple values.
You can also use filter_string_not_equal
option to filter out entries corresponding to NA values.
Cheers, Thang
Hi Thang,
Thanks for your explanation.
Yes, primary_id is the output row_id. The secondary_id are entries contributing to the row_id (I think you got it correct also. It is just a concept that hard to explain very clearly). So if you want to collapse multiple charge states, you can just say secondary_id = c("FG.Charge").
I am a little confused. Should I use secondary_id = c("EG.ModifiedSequence", "FG.Charge")
rather than secondary_id = c("FG.Charge")
because I want to collapse all precursors with the same EG.ModifiedSequence+FG.Charge
?
Best
Fengchao
I am a little confused. Should I use
secondary_id = c("EG.ModifiedSequence", "FG.Charge")
rather thansecondary_id = c("FG.Charge")
because I want to collapse all precursors with the sameEG.ModifiedSequence+FG.Charge
?
As it is now, you have EG.ModifiedSequence
as row IDs. If you want row IDs as EG.ModifiedSequence+FG.Charge
, then you need to concatenate the two columns into one (using R or awk) and use the concatenated columns as primary_id
. But then you will still need to specify the secondary_id
.
Think about secondary_id
as columns that make the primary_id
NOT unique. If there is no duplicate after concatenation of EG.ModifiedSequence
and FG.Charge
, then you do not need to run iq.
Thang
Thank you very much for your prompt response.
Best,
Fengchao
First of all, thank you for developing this wonderful package. It makes the MaxLFQ intensity calculating and report generating much easier and faster.
However, it seems to have issues when the column names have spaces or parentheses. Following is the script I was using
I used
EG.TotalQuantity (Settings)
column as intensity andPG.QValue (Run-Wise)
as one of the FDR filtering. iq threw an error saying "Do not know what to do with (Settings)". Removing the space and parentheses solved the issue.I know that the
read.delim
converts disallowed characters to.
, andread_tsv
surround the column name with `` when there are disallowed characters. I am not sure if iq has a similar approach so I need to put something else to the parameter.Here is the exported Spectronaut report from a public data: test.zip
Thanks,
Fengchao