jertubiana / ScanNet

Apache License 2.0
115 stars 28 forks source link

Question regarding the pretrained model and model representation #12

Open Tizzzzy opened 1 month ago

Tizzzzy commented 1 month ago

Hi author! Huge fan of your work. When I read the README, I didn't see the step of loading the pretrained model. I was wondering for command python predict_bindingsites.py 1brs_A --noMSA, the pretrained model will load automatically or not? Also, is it possible to input a protein pdb file, then extract the latent representation of this protein from the model? If this is possible, can you please show me which line of code is the latent representation of a protein. Thank you so much

jertubiana commented 1 month ago

Hi, Thank you for your interest in your research and your warm words.

On 9 Sep 2024, at 18:34, Tizzzzzzy @.***> wrote:

Hi author! Huge fan of your work. When I read the README, I didn't see the step of loading the pretrained model. I was wondering for command python predict_bindingsites.py 1brs_A --noMSA, the pretrained model will load automatically or not?

Yes, it will load the pretrained model.

Also, is it possible to input a protein pdb file, then extract the latent representation of this protein from the model? If this is possible, can you please show me which line of code is the latent representation of a protein.

See the predict_features.py script, which outputs an embedding for each amino acid. To get a protein-level embeddings, you can average across amino acids.

Best, Jerome

Thank you so much

— Reply to this email directly, view it on GitHub https://github.com/jertubiana/ScanNet/issues/12, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHSPJS4QVL5D5Z4XSTG3FLZVW5W5AVCNFSM6AAAAABN42DED6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGUYTIMRYHAZDGMI. You are receiving this because you are subscribed to this thread.

Tizzzzy commented 1 month ago

Hi Jerome, Just want to double check with you. In predict_features.py, there are two format: numpy and dictionary. When output_format == 'numpy', the embedding for each amino acid is stored in query_features. When output_format == 'dictionary', the embedding for each amino acid is stored in query_dictionary_features. Please correct me if I am wrong. Thank you

jertubiana commented 1 month ago

Correct

On 9 Sep 2024, at 23:44, Tizzzzzzy @.***> wrote:

Hi Jerome, Just want to double check with you. In predict_features.py, there are two format: numpy and dictionary. When output_format == 'numpy', the embedding for each amino acid is stored in query_features. When output_format == 'dictionary', the embedding for each amino acid is stored in query_dictionary_features. Please correct me if I am wrong. Thank you

— Reply to this email directly, view it on GitHub https://github.com/jertubiana/ScanNet/issues/12#issuecomment-2339053021, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHSPJU765MXUY2KON7QUQ3ZVYCBTAVCNFSM6AAAAABN42DED6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZZGA2TGMBSGE. You are receiving this because you commented.

Tizzzzy commented 1 month ago

Hi Jerome, I still have some questions:

  1. Does predict_features.py also support one protein? I know in the original script, the code support multiple proteins, since there are two proteins in the list_quries, like this: predict_features(['1a3x_A','1brs_A'].... However, if I only put one protein in the code, it can still run. So I want to double check with you that if I only put one protein, does the embedding for each amino acid still correct? (Right now the embedding shape is (num_amino_acid, 96))
  2. When I print out the embedding, some amino acid has embedding 0. Is it normal?

Thank you for your time