Closed qm-intel closed 1 year ago
Thanks for looking into GRAMPA. @jswitten answered this question in this thread: https://github.com/zswitten/Antimicrobial-Peptides/issues/8#issuecomment-1459280654.
(Also, completely unimportant but incidentally arabidopsis is not a bacteria it's a plant often used as a model organism for plant biology)
@jswitten
I am trying to use your GRAMPA dataset as a binary classification problem to train my model. As far as If I understood, You use two files for the classification problem:
for example, this specific strain has 14 samples and the term
OS=Arabidopsis thaliana
is bacteria species, is my understanding correct?sp|F4HPR5|DRP5A_ARATH Dynamin-related protein 5A OS=Arabidopsis thaliana OX=3702 GN=DRP5A PE=2 SV=1 MANSNTYLTTPTKTPSSRRNQQSQSKMQSHSKDPINAESRSRFEAYNRLQAAAVAFGEKL PIPEIVAIGGQSDGKSSLLEALLGFRFNVREVEMGTRRPLILQMVHDLSALEPRCRFQDE DSEEYGSPIVSATAVADVIRSRTEALLKKTKTAVSPKPIVMRAEYAHCPNLTIIDTPGFV LKAKKGEPETTPDEILSMVKSLASPPHRILLFLQQSSVEWCSSLWLDAVREIDSSFRRTI VVVSKFDNRLKEFSDRGEVDRYLSASGYLGENTRPYFVALPKDRSTISNDEFRRQISQVD TEVIRHLREGVKGGFDEEKFRSCIGFGSLRDFLESELQKRYKEAAPATLALLEERCSEVT DDMLRMDMKIQATSDVAHLRKAAMLYTASISNHVGALIDGAANPAPEQWGKTTEEERGES GIGSWPGVSVDIKPPNAVLKLYGGAAFERVIHEFRCAAYSIECPPVSREKVANILLAHAG RGGGRGVTEASAEIARTAARSWLAPLLDTACDRLAFVLGSLFEIALERNLNQNSEYEKKT ENMDGYVGFHAAVRNCYSRFVKNLAKQCKQLVRHHLDSVTSPYSMACYENNYHQGGAFGA YNKFNQASPNSFCFELSDTSRDEPMKDQENIPPEKNNGQETTPGKGGESHITVPETPSPD QPCEIVYGLVKKEIGNGPDGVGARKRMARMVGNRNIEPFRVQNGGLMFANADNGMKSSSA YSEICSSAAQHFARIREVLVERSVTSTLNSGFLTPCRDRLVVALGLDLFAVNDDKFMDMF VAPGAIVVLQNERQQLQKRQKILQSCLTEFKTVARSL
Is my understanding correct?
If yes, can you please explain which Uniport file in your fasta files folder
[Fasta Files](https://github.com/zswitten/Antimicrobial-Peptides/tree/master/data/Fasta_files)
can be used as negative class?Thanks