steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
695 stars 92 forks source link

How to create a databases? #223

Open tomato-cmyk opened 6 months ago

tomato-cmyk commented 6 months ago

Expected Behavior

For foldseek createdb example/ targetDB, what should be the type for "example"? I have lots of pdb files in a folder, and now I want to create a databases containing these pdb files. Should I convert these pdb files into a list or other type of files?

Current Behavior

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

Foldssek Output (for bugs)

Please make sure to also post the complete output of Spacepharer. You can use gist.github.com for large output.

Context

Providing context helps us come up with a solution and improve our documentation for the future.

Your Environment

Include as many relevant details about the environment you experienced the bug in.

milot-mirdita commented 6 months ago

Just replace example with the folder containing pdb files. Foldseek will go through the folder and look for all PDB/mmcif files and turn them into a DB.

tomato-cmyk commented 6 months ago

Thanks a lot!

However, I met some new problems. The first one is “No structures found in given input”.

The second one is that “No k-mer could be extracted for the database”

The third one is that “Missing arguments - -score-threshold” ---- Replied Message ----FromMilot @.>Date12/30/2023 00:38 @.> @.>@.>SubjectRe: [steineggerlab/foldseek] How to create a databases? (Issue #223) Just replace example with the folder containing pdb files. Foldseek will go through the folder and look for all PDB/mmcif files and turn them into a DB.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/steineggerlab/foldseek/issues/223#issuecomment-1872208624", "url": "https://github.com/steineggerlab/foldseek/issues/223#issuecomment-1872208624", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

tomato-cmyk commented 6 months ago

Especially the second one, now it shows "No k-mer could be extracted for the database.......Maybe the sequences length is less than 14 residues."

My code is "foldseek easy-cluster un9.pdb pdb tmp -c 0.9"

danny305 commented 5 months ago

I have a question relating to createdb.

When you create a database the<o:sequenceDB> cli parameter appears to be a prefix for all the output file that are dumped in the current directory. this really bloats my directory and I would prefer to have all the files be in their own directory.

How do I get it to dump all the database files into its own directory. When I tried making it a path it failed.

For example: foldseek createdb pdb_dir struct_db_dir/DB_prefix

milot-mirdita commented 5 months ago

unpackdb is the module you want.

danny305 commented 5 months ago

Can you provide an example command on how to use unpackdb?

Sorry, still very new to mmseqs2 and foldseek.

milot-mirdita commented 5 months ago

I think I misunderstood what you want. Disregard unpackdb.

Just make sure the struct_db_dir exists before calling createdb as it does not try to create parent directories. The command you posted should work, if you call mkdir before.