Open zeynepabali opened 3 years ago
AFAIK, the PDB is using an MMseqs2 based workflow, but I don't really know what they are doing. @martin-steinegger added some features at the request of the PDB team, he might be able to put you in contact with the right people.
Thank you very much. I will try to get in contact with him.
I had contact quite some time with Zukang Feng (https://www.rcsb.org/pages/team) from the PDB. I am actually not sure what parameters they exactly they use at the moment. Maybe it would be good to contact him.
However, I remember that they replaced blastclust
. blastclust
uses connected component clustering. So you need use --cluster-mode 1
in mmseqs
.
mmseqs cluster pdb_seq_pr pdb_seq_pr_clu_s8_maxseqs1000 tmp_clu7 --cov-mode 0 -c 0.90 --min-seq-id 0.3 -s 7 --max-seqs 1000 --cluster-mode 1 -a
Hello, have you maybe figured this out?
This is what is used internally at RCSB PDB (with a few different thresholds for sequence identitiy):
mmseqs easy-cluster pdb_protein_sequence.fasta-A.gz session --min-seq-id 0.3 -c 0.9 -s 8 --max-seqs 1000 --cluster-mode 1
Hi, I am not sure if this is the right place to ask this, but is there a set of options to recreate the same clustering as the ones in the weekly sequence clustering of PDB. As in this link for example: https://cdn.rcsb.org/resources/sequence/clusters/bc-100.out