Open rakeshr10 opened 4 months ago
Hm, would you mind sharing an example for this specific error? - I did not encounter it so far. The only thing I could imagine is that there are some very weird characters in your headers which cause problems (or maybe that some IDs appear twice which might also cause problems). Maybe check for those by simply mapping each of the sequence IDs to some unique hash string and see whether the problem persists when you re-run the script using those unique headers
@mheinzinger If this happens does that mean the entire file will not be converted to foldseekdb or it only affects a particular sequence in a file.
Does the tool expect the fasta headers to be in a specific format?
No, the tool does not expect the fasta headers to be in a specific format but in case you have very exotic things written there, it might just lead to weird/unforseeable downstream effects. Therefor, (mostly for debugging), I recommended replacing the headers with sth that has to work (e.g. just a string of letters/numbers, nothing else) and see whether the problem persists.
@mheinzinger I get this error on some of the fasta files when I use generate foldseek db script. Is it because there is a mismatch in the fasta header or the 3di sequence was not predicted for a particular sequence.
Error: entry id in amino-acid FASTA file has no corresponding 3Di string