Open TANIAKMONS opened 1 month ago
Hi TK,
Protein sequence IDs should be read correctly from any FASTA. All other information you can always pull out of the FASTA using some FASTA-reading R package, to annotate DIA-NN's output report.
We have tried a first time wihtout Uniprot annotation and it did not.
How did it manifest?
Best, Vadim
Hi,
I'm having the same issue in the library free search. The FASTA header for example looks like this:
_>P62874,Q3TQ70|TX=10090 OS=Mouse GN=ENSMUSG00000029064.16,Gnb1 TA=NM_001160016.1,ENSMUST00000105616.10,XM_017319977.2,NM_001160017.1,ENSMUST00000030940.14,ENSMUST00000176637.2,ENSMUST00000165335.8,NM_008142.4 PA=ENSMUSP00000030940.8,NP_032168.1,ENSMUSP00000135091.2,XP_017175466.1,ENSMUSP00000101241.4,NP_001153488.1,ENSMUSP00000130123.2,NP001153489.1,P62874,Q3TQ70 (fasta file from openprot (microprotein identification) with > 500000 entries) and the output in the log is the following:
[0:48] Processing FASTA [1:35] Assembling elution groups [2:47] 23495123 precursors generated [2:47] Gene names missing for some isoforms [2:47] Library contains 1 proteins, and 1 genes [2:51] Encoding peptides for spectra and RTs prediction
Any idea how to fix this issue?
Thanks ! Best, Sara
Hi Sara,
DIA-NN will not correctly extract protein names from this. It should get the IDs OK though, i.e. you can annotate DIA-NN output using some FASTA-reading R package.
Best, Vadim
Hi Vadim,
I had the same thing than Sara (Library contains 1 proteins, and 1 genes). We have done a scrpit to incorporate Uniprot annotations within the FASTA and now we use DIANN 1.9. This is the result we have:
10 files will be processed [0:00] Loading FASTA C:\Tania\output_proteinpilot2.fasta [2:07] Processing FASTA [4:11] Assembling elution groups [6:57] 59894740 precursors generated [6:58] Gene names missing for some isoforms [6:58] Library contains 717220 proteins, and 1 genes [7:09] Encoding peptides for spectra and RTs prediction [9:53] Predicting spectra and IMs [370:52] Predicting RTs [409:47] Decoding predicted spectra and IMs [411:19] Decoding RTs [412:01] Saving the library to C:\Tania\DIA-NN\1.9\report.predicted.speclib [415:57] Initialising library
First pass: generating a spectral library from DIA data
[418:51] File #1/10 [418:51] Loading run C:\Tania\PSF21h.wiff [421:59] 59872940 library precursors are potentially detectable [423:20] Processing.
Since it is very long to process .... we will run it on a more powerfull server, it works with linux. Is it the smae command lin ethan with DIANN 1.8 ?
Thanks, Kind Regards,
TK
Hi TK,
I would suggest to try the recommended settings first, which should result in much smaller predicted library & search space.
No, I don't recommend using 1.8.1. If you do, please make sure to use the predicted library generated by 1.9.
Best, Vadim
Hello,
I have an issue with the FASTA format. It is a FASTA format which was made from the Illumina Sequencing and annotated with KREGG. We have tried a first time wihtout Uniprot annotation and it did not. Will it work if the FASTA is composed of different annotation uncluded the Uniprot one ? it seems that we can't just have the Uniprot FASTA format.
Thanks in advance TK