I am currently using the FoldSeek tool for protein structure analysis. This is a significant undertaking that has greatly facilitated the convenient exploration of protein structures in the post-Alphafold era. Now, I want to precisely determine the one-to-one correspondence between the protein amino acid sequence and the protein structure 3Di sequence. For example, I recently downloaded the alphafold-swissprot database (from https://foldseek.steineggerlab.workers.dev/afdb_swissprot.tar.gz), and after extracting the files, I found two key files named afdb_swissprot and afdb_swissprot_ss.
Based on my interpretation, I believe that the afdb_swissprot file contains the amino acid sequences, while the afdb_swissprot_ss file contains the corresponding 3Di structure sequences. It seems that the same line in these files represents the same protein.
Additionally, I am interested in obtaining the alphafold identifier or UniProt identifier for each protein.
Could you kindly confirm if my understanding of the file contents is accurate? If so, could you provide guidance on how I can obtain the alphafold identifier or UniProt identifier for each protein in the database? Moreover, is there any associated metadata file that might contain further details?
I appreciate your assistance in this matter and look forward to your guidance. Thank you for your time and consideration.
I am currently using the FoldSeek tool for protein structure analysis. This is a significant undertaking that has greatly facilitated the convenient exploration of protein structures in the post-Alphafold era. Now, I want to precisely determine the one-to-one correspondence between the protein amino acid sequence and the protein structure 3Di sequence. For example, I recently downloaded the alphafold-swissprot database (from https://foldseek.steineggerlab.workers.dev/afdb_swissprot.tar.gz), and after extracting the files, I found two key files named afdb_swissprot and afdb_swissprot_ss.
Based on my interpretation, I believe that the afdb_swissprot file contains the amino acid sequences, while the afdb_swissprot_ss file contains the corresponding 3Di structure sequences. It seems that the same line in these files represents the same protein. Additionally, I am interested in obtaining the alphafold identifier or UniProt identifier for each protein. Could you kindly confirm if my understanding of the file contents is accurate? If so, could you provide guidance on how I can obtain the alphafold identifier or UniProt identifier for each protein in the database? Moreover, is there any associated metadata file that might contain further details?
I appreciate your assistance in this matter and look forward to your guidance. Thank you for your time and consideration.![image](https://github.com/steineggerlab/foldseek/assets/87804802/ad74da1e-9e64-45d6-99e0-b2da703f5466)