Open jzhanghzau opened 1 month ago
Ah, it seems I can download the file below and then calculate the size of the clusters. Based on their size, I can perform some filtering and subsequently retrieve the FASTA files through the UniProt API. Is that correct? By the way, are both the entryID and repID UniProt IDs? Thanks!
Hi JJ,
As I understood, you are looking for the data that contain (1) the FASTA sequence of representatives (2) cluster size
and wonder if the ids in file no. 1 are uniprot IDs.
Firstly, to get the sequences of the representatives, we are not providing the raw data. As you found, you can get the Uniprot IDs of the representatives to retrieve the seqs by any Uniprot API.
We are providing the cluster information here in file no. 2. The caveat is that it is only about the foldseek clusters. If you want to include sequence cluster members, you have to compute it on yourself.
Lastly, the ids in the picture you attached are Uniprot Ids.
Hope this helped you out
Jingi Yeo
Hi JJ,
As I understood, you are looking for the data that contain (1) the FASTA sequence of representatives (2) cluster size
and wonder if the ids in file no. 1 are uniprot IDs.
Firstly, to get the sequences of the representatives, we are not providing the raw data. As you found, you can get the Uniprot IDs of the representatives to retrieve the seqs by any Uniprot API.
We are providing the cluster information here in file no. 2. The caveat is that it is only about the foldseek clusters. If you want to include sequence cluster members, you have to compute it on yourself.
Lastly, the ids in the picture you attached are Uniprot Ids.
Hope this helped you out
Jingi Yeo
Thanks!
Hi,
First of all, thanks for your amazing work!
I want AFDB clusters to do some analysis, the fields I need are the REPRESENTATIVE FASTA sequence, and the CLUSTER SIZE, it would be nice to have the MSA file for the representative fasta sequence. Which dataset should I download from the foldseek server, and is there a detailed description of these datasets?
Looking forward to your reply.
Thank you.
JJ