mgerault / DIAgui

An interactive shiny app for processing DIA-nn output (filtering, MaxLFQ, Top3, iBAQ, etc.)
4 stars 2 forks source link

Error when calculating protein groups #1

Closed silasmellor closed 1 year ago

silasmellor commented 1 year ago

Hey there, firstly let me just say its super nice to have this available to process DIA-NN outputs further. I am getting an error running the protein groups calculations.

The error says "error in [.data.frame: undefined columns selected"

I am processing data for a non-uniprot proteome, so i dont know if that may cause any issues?

Calculation... Concatenating secondary ids...

Removing low intensities...

Concatenating secondary ids...

Removing low intensities...

nrow = 925646, # proteins = 10377, # samples = 15 11 thread(s) available... 9% 14% 20% 25% 30% 35% 40% 67% 72% 77% 82% Completed. Warning: Error in [.data.frame: udefinerede kolonner valgt 2: shiny::runApp 1: runDIAgui

mgerault commented 1 year ago

Hello, first thank you !

Were you trying to get the iBAQ quantification without a FASTA file ? If you're using non-uniprot proteome, it could be the cause of the error then. Or maybe you didn't put the right species ? In any case, if you were not trying to get iBAQ, it shouldn't cause any problem. If so, could you send me a little sample of your data ?

silasmellor commented 1 year ago

I was trying to do iBAQ yes, but i supplied a fasta file. I suspect that file may be off somehow though (i had a bit of trouble getting this to work also for running DIA-NN, but managed in the end - i thought).

My fasta headers look like this:

Peinf101Ctg12185235g00001.1 ubiquitin-conjugating enzyme 20 Data is DIA from a DDA library generated with fragpipe using the fasta format above.

I also tried doing in silico DIA in DIA-NN, but could only get it to work if i reformatted the headers to:

sp|Peinf101Ctg12185235g00001|Peinf101Ctg12185235g00001.1 ubiquitin-conjugating enzyme 20 OS=Petunia inflata OX=212142 PE=4 SV=1

I realise OS should be Petunia integrifolia subsp. inflata but i dont know if that is enough to throw it off?

mgerault commented 1 year ago

Well, in this function, it's looking for the peptide sequence from the given identifier, i.e. the protein ID; which is after 'sp' in a classic FASTA files like : 'sp|O15263|DFB4A_HUMAN Defensin beta 4A OS=Homo sapiens OX=9606 GN=DEFB4B PE=1 SV=1'.

So in order to work, the IDs in your report files must be 'Peinf101Ctg12185235g00001' and it needs to have a peptide sequence attached. If it's not the case it will render a NULL result and this could be why you had the error "error in [.data.frame: undefined columns selected". Maybe you can try to run the function 'getallseq' like this : getallseq(pr_id = "Peinf101Ctg12185235g00001", bank_name = "path/to_your/FATSA_file.fasta", fasta_file = TRUE) If it returns an empty list, it means that your FASTA file doesn't contain 'Peinf101Ctg12185235g00001' or any peptide sequence.

silasmellor commented 1 year ago

I think the problem may be that my report contains the reference with .1 (mRNA), when i search for that in the fasta file i get an empty list (but without that it returns the sequence correctly). I will try to reformat my fasta headers and try again. Will let you know how that works.

silasmellor commented 1 year ago

Ok flipping the IDs around did no difference. I also checked the report.tsv for the in silico and it already appears to follow the nomenclature i used for the pseudo uniprot headers, so it should be able to find the sequence.

Here are a few lines from my report.tsv edit: sorry that was a mess, attached instead. example.txt

mgerault commented 1 year ago

So, I checked and in fact it is simpler than expected. Your file doesn't contain the column 'First.Protein.Description' which is explicitely used in the app (cause I assumed it would always be in the report file). So the easiest solution would be to add a column named 'First.Protein.Description' to your report file, filled with NA or else.

silasmellor commented 1 year ago

Yep that did it! I added the column but left it empty and that seems to work fine. Thanks!

mgerault commented 1 year ago

Ok perfect ! I'm closing this issue then