vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
279 stars 54 forks source link

Question regarding FASTA file gene name retrieval #772

Open juliesoelberg opened 1 year ago

juliesoelberg commented 1 year ago

Hi,

Thank you for the really nice software! I do have an issue with getting proper gene names out from a FASTA file, and have not been able to find any information regarding how the gene name is retrieved from the FASTA file.

My FASTA is generated from some WGS data we have.

FASTA example:

Trichophyton|32|FUN_000001-T1|NA|hypothetical protein|COG:E|||EC:4.3.1.18|PF14031,PF01168|IPR026956,IPR001608,IPR042208,IPR029066 MTSNIAYQPTAEVADLKALYVGRMLQDVDGPKPVIDREVARRNCQVMLDAALALNVEFRAHVKTHKTTELTRFQVGERSDTVRLVASTLVEAEQLVPFMKECQTKGRKVDLIYGLPVQPSCFPRLAELGKALGHGAVTCLVDSVDIVPFLSRYHALCGKRLGVFIKLDTGYGRAGVTYSSAQFNAIVSELYALEGKEPHLFTLRGFYSHMGHSYGSNNPSEAMDYLRTEIEGCKLAADRASAIPPPTPFDGETNYSQRRFVLSVGATPSTTAAQNLTGHETLSLPGADKAKDLIDQTKQKYDIELHAGAYVTLDMQQLAARARPNTSHLSFDDLALTVLAEVGSLYMHREHPEALVACGSLAIGREPCRSYKGWGVVTPWREQQQANDAAPAADERVGFYNPDGDKTGWILDRVSQEHGILRWHGSRQNMRPLRIGEKLRIWPNHCCICLAGFTYVLVVDSTAQGSEKDRIVDVWQSWRGW Trichophyton|32|FUN_000002-T1|NA|hypothetical protein|COG:J|||||IPR035959,IPR006175 MSAKRAVFTDKAPAPLPVFSQAIVHNGIVYCSGQVGTDPATRELVEGTVKDRTRWRDWDDGGLDRHRMVIDDVLTYREQAQIFRNITAVLEAAGSSLEKLLKVNIFLTNMDDFAAVNDVYAQVLNFEPKPVRTCVAVKTLPRNTDVEIECSAYI

Output example: Genes (L-kynurenine (Si)-synthase|COG:H||GO:0004108,GO:0046912,GO:0006101|EC:2.3.3.1|PF00285|IPR010109,IPR036969,IPR016143,IPR016142,IPR002020,IPR019810 (complex (high (translocase

vdemichev commented 1 year ago

Options: