zswitten / Antimicrobial-Peptides

Collecting AMP MIC data from different sources, then running a GAN to output promising sequences
65 stars 14 forks source link

Converting the regression MIC value problem to a classification problem AMP vs. non-AMP. How? #14

Closed qm-intel closed 1 year ago

qm-intel commented 1 year ago

@jswitten

I am trying to use your GRAMPA dataset as a binary classification problem to train my model. As far as If I understood, You use two files for the classification problem:

for example, this specific strain has 14 samples and the term OS=Arabidopsis thaliana is bacteria species, is my understanding correct? sp|F4HPR5|DRP5A_ARATH Dynamin-related protein 5A OS=Arabidopsis thaliana OX=3702 GN=DRP5A PE=2 SV=1 MANSNTYLTTPTKTPSSRRNQQSQSKMQSHSKDPINAESRSRFEAYNRLQAAAVAFGEKL PIPEIVAIGGQSDGKSSLLEALLGFRFNVREVEMGTRRPLILQMVHDLSALEPRCRFQDE DSEEYGSPIVSATAVADVIRSRTEALLKKTKTAVSPKPIVMRAEYAHCPNLTIIDTPGFV LKAKKGEPETTPDEILSMVKSLASPPHRILLFLQQSSVEWCSSLWLDAVREIDSSFRRTI VVVSKFDNRLKEFSDRGEVDRYLSASGYLGENTRPYFVALPKDRSTISNDEFRRQISQVD TEVIRHLREGVKGGFDEEKFRSCIGFGSLRDFLESELQKRYKEAAPATLALLEERCSEVT DDMLRMDMKIQATSDVAHLRKAAMLYTASISNHVGALIDGAANPAPEQWGKTTEEERGES GIGSWPGVSVDIKPPNAVLKLYGGAAFERVIHEFRCAAYSIECPPVSREKVANILLAHAG RGGGRGVTEASAEIARTAARSWLAPLLDTACDRLAFVLGSLFEIALERNLNQNSEYEKKT ENMDGYVGFHAAVRNCYSRFVKNLAKQCKQLVRHHLDSVTSPYSMACYENNYHQGGAFGA YNKFNQASPNSFCFELSDTSRDEPMKDQENIPPEKNNGQETTPGKGGESHITVPETPSPD QPCEIVYGLVKKEIGNGPDGVGARKRMARMVGNRNIEPFRVQNGGLMFANADNGMKSSSA YSEICSSAAQHFARIREVLVERSVTSTLNSGFLTPCRDRLVVALGLDLFAVNDDKFMDMF VAPGAIVVLQNERQQLQKRQKILQSCLTEFKTVARSL

Is my understanding correct?

If yes, can you please explain which Uniport file in your fasta files folder [Fasta Files](https://github.com/zswitten/Antimicrobial-Peptides/tree/master/data/Fasta_files) can be used as negative class?

Thanks

zswitten commented 1 year ago

Thanks for looking into GRAMPA. @jswitten answered this question in this thread: https://github.com/zswitten/Antimicrobial-Peptides/issues/8#issuecomment-1459280654.

jswitten commented 1 year ago

(Also, completely unimportant but incidentally arabidopsis is not a bacteria it's a plant often used as a model organism for plant biology)