wrong number of proteins in fasta file

tasos2310 commented 2 years ago

Hi all, I have a issue with a fasta file that I have and I want to generate a library. As you can see in the image below the program identify 1977 proteins so basically 1977 sequences in the file, but the file has more than 14000 sequences. I would like to know how I can solve this issue, because I think that maybe my library was generated from only these sequences. Thank you in advance for your time. Best Tasos C.fulvum proteome.fasta.txt DIA-NN

vdemichev commented 2 years ago

Hi Tasos,

DIA-NN reports in the log the number of protein names in uniprot format, not sequence IDs. In the DIA-NN report sequence IDs will be annotated properly.

Best, Vadim

tasos2310 commented 1 year ago

Hi Vadim, Thanks for your reply. To clarify that I understand right you mean that proteins that they have same name in uniprot (ex. unknown, zinc proteins etc)they count it as one ID for DIA-NN,l right?? Best. Tasos

From: Vadim Demichev @.> Sent: Monday, September 5, 2022 3:15 AM To: vdemichev/DiaNN @.> Cc: Anastasios Samaras @.>; Author @.> Subject: Re: [vdemichev/DiaNN] wrong number of proteins in fasta file (Issue #490)

Hi Tasos,

DIA-NN reports in the log the number of protein names in uniprot format, not sequence IDs. In the DIA-NN report sequence IDs will be annotated properly.

Best, Vadim

— Reply to this email directly, view it on GitHubhttps://github.com/vdemichev/DiaNN/issues/490#issuecomment-1236809557, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AZBJOL3W4IAXPHELJPY3LWLV4XB33ANCNFSM6AAAAAAQEFE7K4. You are receiving this because you authored the thread.Message ID: @.***>

vdemichev commented 1 year ago

Hi Tasos,

No, in this case DIA-NN simply reports the number of distinct protein names in the library.

Best, Vadim

vdemichev / DiaNN

wrong number of proteins in fasta file #490