Closed timghaly closed 8 months ago
My apologies, I just realised that the AFDB50 includes proteins that did no get included in any AFDB cluster after fragments and Foldseek cluster singletons were removed. I think that solved my issue.
Cheers, Tim
Thank you!
Thanks Foldseek team for this great tool and amazing work clustering the AFDB. I am wondering where I can found out which AFDB cluster each of the AFDB50 rep sequences from the Foldseek database 'Alphafold/UniProt50-minimal' belong. The 1-AFDBClusters-entryId_repId_taxId.tsv.gz file from https://afdb-cluster.steineggerlab.workers.dev/ has the info that I'm after, but seems that this has not been updated with the increase in AFDB50 size. The number of member seqs in '1-AFDBClusters-entryId_repId_taxId.tsv.gz' is ~30million, while there are ~50million protein seqs in the Alphafold/UniProt50-minimal database. Do you have the cluster-member relationships for the remaining 20million seqs?
Many thanks for your help!
Kindest regards, Tim