Closed YuanLiu-SJTU closed 2 months ago
states.txt The file contains the positions of the clusters in the VQ-VAE latent space. They need to be copied to here (centroids): https://github.com/steineggerlab/foldseek/blob/master/lib/3di/structureto3di.h You might also want to change which state is used for residues with missing coordinates (INVALID_STATE).
decoder.pt is not needed
encoder.pt
This is the pytorch model file of the encoder. Convert it to a keras model. Then, kerasify (https://github.com/moof2k/kerasify) can be used to convert the keras model into the kerasify file (https://github.com/steineggerlab/foldseek/blob/master/data/encoder_weights_3di.kerasify), which is then used by Foldseek.
I uploaded the script for these steps (https://github.com/steineggerlab/foldseek-analysis/blob/main/training/pt2kerasify.py), but it works only for simple fully connected models:
python3 pt2kerasify.py encoder.pt .../encoder_weights_3di.kerasify
Note that the substitution matrix (https://github.com/steineggerlab/foldseek/blob/master/data/mat3di.out) also needs to be changed.
I am very interested in your excellent work Foldseek. I want to re-train it on my own dataset based on your GitHub repository “foldseek-analysis”. I checked the code and found that it saves three files: “encoder. pt, decoder. pt, and states.txt”. Then how can I use it to compile a complete executable program with foldseek's source code? Could you please provide some guidance? Thank you very much.