pagnani / ArDCA.jl

Autoregressive networks for protein
MIT License
33 stars 8 forks source link

How to download the aligned fasta files from Pfam database? #34

Closed psp3dcg closed 7 months ago

psp3dcg commented 8 months ago

Thanks for the really nice package! However, when I want to test the package on more Pfam data, the downloaded fasta file from InterPro (https://www.ebi.ac.uk/interpro/entry/pfam/#table) is not aligned. For example (PF00020)

A0A060W133|unreviewed|Tumor necrosis factor receptor superfamily member 6|taxID:8022 MNKYTFLYILCILCTVRLTTPFNAERSSQDILITSKLRTKRQSCQDGTYQHEGMACCLCAAGQHLESHCSVSPEDGTCVY CEENRTYNSDPNSLDSCEPCTSCDSKANLEVEDRCTIFKDSVCRCQQGHYCNKGKEHCRACYPCTICSEEGIKVACSATN NTICHAFKEQGRNLAVVFVLTTVLLVLLVIIYLWRSNKYCFGPNGGLTELPNRSSEEMQPLRGVNLWPHLPDIAKTLGWR DMKQVAECSGMTHTAIESHQLNFPNDSQEQCSSLLRAWVEKEGMTTASVTLVQTLLRMKKKVKAEDIMAIISNKEDGVTG QNSGSGQV A0A060W225|unreviewed|TNFR-Cys domain-containing protein|taxID:8022 MFDKSMSNIGLHYMVVLLIWALNPMVAAQSGLKLTRTGGSVRNLTQRDISCQENLEYPHDNICCLNCLAGTYVKEYCTRA LERGTCEACEFDTYTEHGNGLRQCLKCTTCHSDQVTTKACTITQDRECRCKPGSFCAPDQACEVCKKCLRCEENEVRLKN CTATSNTVCKTRLPAPSTIPGTRPGTADIPLLHALLTPVYYYGLGFYCVLTQ

So how to get the aligned fasta like the example in your package? Thank U~

pagnani commented 7 months ago

Hi @psp3dcg. There are many options to have align sequences. This aspect goes beyond the package that assumes aligned inputs.

If you have a bunch of homologous sequences you could feed them to HHblits

pagnani commented 7 months ago

Closing as not planned

psp3dcg commented 7 months ago

OK, thank you~