Open rcedgar opened 3 months ago
Update -- I was able to work around the problem by removing alntmscore
from the format-output
option, I'm guessing computing the TM alignment is much slower than the S-W 3Di alignment and is not needed to calculate the E-value.
I want to know where you got the SCOP40 or 35 files to createdb? I have to do the SCOP against my bundles of protein structures but couldn't get the files to createdb.
I am not recommending to use this, it’s quite an old version. It make sense to use the latest for annotation or benchmarking https://scop.berkeley.edu/
Noted thanks, will do for anything written up but for preliminary work it's helpful that the expensive computes for DALI and TMalign are included in the downloads for the foldseek paper.
Thanks @rcedgar @martin-steinegger, got it. It would be so kind of you if you preassemble and add it like other databases in the foldseek @martin-steinegger
Hi @martin-steinegger with --format-output "query,target,evalue"
foldseek completes SCOP40 quickly but the sensitivity is lower than reported in the paper. Presumably I need to tweak some options such as --max-seqs
and --exhaustive-search
but I don't see the command line in Methods or Supp Data, What are recommended options for comparative validation? Thanks!
We have all scripts for benchmarking here https://github.com/steineggerlab/foldseek-analysis
Much better! Seems accuracy is getting close to DALI now, is there any explanation of improvements in the algorithm?
Hello @rcedgar https://wwwuser.gwdg.de/~compbiol/foldseek
Can you please tell me the version of this scop40? is it SCOPe 2.08?
hello @12047019 sorry I don't know -- this is a question for the foldseek authors, I couldn't figure out the exact version myself, I had to use the scop_lookup.fix.tsv file in their repo to assign families to domains.
Clustering SCOPe 2.01 at 40% sequence identity yielded 11,211 non-redundant protein sequences (SCOPe40).
From the paper.
I'm trying to implement the SCOP40 test using the latest foldseek. The
creatdb
command completes; theeasy-search
command runs for a while but then hangs indefinitely. Advice welcomed for how to implement this in the best way for measuring foldseek speed and accuracy, thanks for any help!