mathiaskalxdorf / IceR

Quantitative proteomics workflow
https://mathiaskalxdorf.github.io/IceR/
14 stars 4 forks source link

Question about load requantitation data #10

Closed ShuoQian1993 closed 3 years ago

ShuoQian1993 commented 3 years ago

Hi,

Recently I read your published paper "IceR improves proteome coverage and data completeness in global and single-cell proteomics", and I really appreciate your huge improvement on MaxQuant. Therefore, I decided to try your approach on my data. I used your R shiny app and it worked very well. however, when I tried to use function "load_Requant_data()" to load my IceR requantification file, it popped an error called: " stringr::str_split(data_protein$Gene_Name, ";", simplify = T)[, : subscript out of bounds ". I looked through your codes of this function and found "data_protein" is a dataset only containing protein abundances. There is no column called "Gene_Name" or "UniProt_Identifier" in the dataset. Does this error come out because I did something wrong or there is a problem in the codes? Please give me some help. Thank you very much.

Best wishes, Shuo

mathiaskalxdorf commented 3 years ago

Hi Shuo,

many thanks for your interest in our work. Regarding your issue. If data_protein is only containing quantification columns and not the additional columns "Gene_name" and "UniProt_Identifier" then it means that IceR could not finish the analysis completely because (most likely) the fasta file, which you used during MaxQ processing, was not correctly parsed. Please check if the Column "Gene names" is present in the proteinGroups.txt from MaxQ, which you used as input for IceR. If this column is not available, then either your fasta file did not contain the information for the gene names or it was not parsed using MaxQ with the correct parsing rule. IceR can finish the quantification run without having the gene names (as it is actually only requiring the identifiers e.g. from uniprot), however, load_Requant_data() assumes that the "Gene_name" column is available in the final results from IceR. Hence, if it is not available, it will crash at the moment. I will update this function so that it can still work, however, I would always recommend to just parse the fasta file correctly during MaxQ processing so that Gene Names are correctly available. Otherwise you would have to do this manually afterwards.

ShuoQian1993 commented 3 years ago

It's quite weird that my fasta file contains gene name (GN) and identifier (OX) but are not shown in proteinGroups.txt. I will look into it and I will let you know if I still have problems. Thank you for your help.