timkartar / DeepPBS

Geometric deep learning of protein–DNA binding specificity
BSD 3-Clause "New" or "Revised" License
46 stars 5 forks source link

Question regarding how to extract protein representation from your pretrain model #12

Closed Tizzzzy closed 1 month ago

Tizzzzy commented 1 month ago

Hi author, Huge fan of your work. I am currently trying to apply your code to a downstream task. Specifically, I am working with a protein PDB file and aim to extract the latent representation of the protein using your pretrained model. I was wondering if this is possible. If it is, could you kindly show me which script I should run and which line of code store the representation of the protein? Thank you so much for your time

timkartar commented 1 month ago

Hi there !

Sounds like you are working on a cool project.

The protein vectors are stored in this line : https://github.com/timkartar/DeepPBS/blob/738a9d272057c10196a0f0766c0b522743c5403b/run/models/model_v2.py#L102 I would say though, that these are probably most meaningful for the interface residues, not elsewhere.

Tizzzzy commented 1 month ago

Hi author, Thank you for your reply and information! However, after checking the README, I am still not sure how to run model_v2.py. Do you mind telling me which command I should use to run this script? Also, how to input my pdb file path and the pretrain model into this script? Thank you so much for your time and your help!

timkartar commented 1 month ago

predict.py does all those things (called by process_and_predict.sh). You need to modify model_v2 to return your vector along with the output of model_v2 (and catch the output in predict.py).