I want to find protein structures similar to my query database, filter results by TM-score above 0.6, and then find cluster representatives for each protein in my query database.
I followed:
foldseek createdb example/ targetDB
foldseek createdb example/ queryDB
foldseek search queryDB targetDB aln tmpFolder -a #is there a way to filter results here by e-value?
foldseek aln2tmscore queryDB targetDB aln aln_tmscore #is there a way to filter results here by tm-score?
foldseek createtsv queryDB targetDB aln_tmscore aln_tmscore.tsv
I was going to use the following for clustering:
foldseek createdb example/ db
foldseek search db db aln tmpFolder -c 0.8
foldseek clust db aln clu #is the db here the queryDB or targetDB?
foldseek createtsv db db clu clu.tsv
I see that the this workflow does a new alignment search where it filters results that have 80% coverage. This is closer to what I'm looking for except instead of sequence coverage, I want e-value/TM-score threshold cutoff.
Problems I'm running into:
having a way to filter alignment results by e-value/TM-score
is the db in "foldseek clust db aln clu" the queryDB or targetDB
I want to find protein structures similar to my query database, filter results by TM-score above 0.6, and then find cluster representatives for each protein in my query database.
I followed:
I was going to use the following for clustering:
I see that the this workflow does a new alignment search where it filters results that have 80% coverage. This is closer to what I'm looking for except instead of sequence coverage, I want e-value/TM-score threshold cutoff.
Problems I'm running into: