Open LTEnjoy opened 8 months ago
You can use createsubdb
with a list of accessions and then call covert2fasta
to make a FASTA file:
foldseek createsubdb accession_list alphafold_swissport afsp_subset --id-mode 1
foldseek convert2fasta afsp_subset afsp_subset.fasta
Please check that the accessions you pass are in the same format as the ones that are stored in the second column of the alphafold_swissport.lookup
file.
Thank for your quick reply! I tried above commands and it indeed generated a fasta file!
It's just slightly different than what I thought as I want to get sequences encoded by foldseek, not the residue sequences. Could you tell me how to generate that kind of fasta file?
Thank you again!
You mean the 3Di sequences?
foldseek createsubdb accession_list alphafold_swissport_ss afsp_subset_ss --id-mode 1
foldseek lndb alphafold_swissport_h afsp_subset_ss_h
foldseek convert2fasta afsp_subset_ss afsp_subset_ss.fasta
That's exactly what I want!
Thank you very much! Have a nice day!
Hello,
When I tried the command foldseek createsubdb accession_list alphafold_swissport_ss afsp_subset_ss --id-mode 1
on af50db, I got these errors:
Could you tell how I can fix this problem? I want to generate all UniProt 3Di sequences from this database.
I think you have to run first:
ln -s alphafold_swissport.lookup alphafold_swissport_ss.lookup
I just tried this command, but errors still exist.
Also, here are some contents in my accession_list.txt
:
Could you please post all commands you executed (preferably as text and not as screenshots)? I am not sure what's going on currently.
Hi,
I guess I found what the problem was. the afdb50
only contains 50M sequences after clustering. But what I need is to generate sequences from the whole UniProt database (with ~200M sequences). So I downloaded the afdb
database, by which I think the problem should be solved.
Hi!
Thank you for your great work! I have a question that whether I can download a pre-generated database and manually generate a fasta file containing all protein names and corresponding foldseek sequences.
For example for the
alphafold_swissprot
database, I want to extract from this database all UniProt IDs and foldseek sequences and write it into a fasta file like:Thank you in advance and I'm looking forward to your reply!