Open Jigyasa3 opened 3 weeks ago
Hi, Thanks a lot for your interest in our tool :) I took a quick look and can confirm your issue: your sequence indeed returns search hits when using the foldseek webserver with ProstT5-predicted 3Di (in fact eval etc indicate really good scoring for e.g. https://www.uniprot.org/uniprotkb/Q2ETE4/entry). I also compared the ProstT5 3Di-prediction from the webserver to your attached 3Di file and they are identical (so Step 1 # get the predicted 3Di structure) also works fine. You could manually check the resulting DB from step #2 (sorry ran out of time to debug this on my end - but you could check on your end by comparing it to the expected format described in e.g. section "Sequence database format" in the MMSeqs2 userguide). For step 3, I would recommend to remove anything that might cause any issues (even if its just as small as changing output format as you did with --format-mode 4 or alignment type. Hope this helps! -
Hi everyone,
Thank you for an amazing tool! I am generating the 3Di structure of a protein sequence of interest to structurally annotate against Foldseek database. Here are the codes that I am using-
get the predicted 3Di structure
python /groups/rubin/databases/foldseek/scripts/predict_3Di_encoderOnly.py -i ${file1} -o ${OUT_DIR}/predicted_3Di_${file1} --model ${DB_DIR}/
#DB_DIR contains the alphafold_uniprot50generate foldseek database
python /groups/rubin/databases/foldseek/scripts/generate_foldseek_db.py ${IN_DIR}/rep_protein1.faa ${OUT_DIR}/predicted_3Di_rep_protein1 rep_protein1
run foldseek
foldseek easy-search ${IN_DIR}/rep_protein1 ${DB_DIR}/alphafold_uniprot ${OUT_DIR}/rep_protein1_protT5.txt tmp --format-mode 4 --alignment-type 1
While the first two steps generate predicted 3Di and foldseek database, the foldseek output is empty. I ran this same protein on Foldseek web tool and it works, so I think I am doing something wrong in the first two steps. Any suggestions why this might be happening? I am attaching the protein sequence and the 3Di file to reproduce the results.