uw-ipd / RoseTTAFold2NA

RoseTTAFold2 protein/nucleic acid complex prediction
MIT License
319 stars 71 forks source link

Trying to model homodimer results in two monomers in same space. #5

Open ramjet10 opened 1 year ago

ramjet10 commented 1 year ago

I am trying to model a protein homodimer complexed with dsDNA. I have entered in the two identical protein monomer chains as separate files and DNA strands as two separate files. In a command similar to this: ../runRF2NA.sh t000 Protein1.fa Protein2.fa D:DNA-box_F.fa D:DNA-box_R.fa where protein1 and protein2 are an identical sequence in two separate files, also named as protein1 and protein2 in the fasta headers. However, the output model has the two copies of the protein superimposed on top of each other as near-identical models, rather than as separate models in a homodimer complex. Am I doing something wrong here or is this a limitation of roseTTafold?

SamuelSchwab commented 1 year ago

Same problem when predicting the homodimer of P19267 (which has dimer and DNA binding PDB structures) complexed with dsDNA: RF2NA_P19267_dimer_DNA

I use this command to run the prediction (I have also attached the fasta files: fasta_files_hmfb.zip): ../run_RF2NA.sh hmfb_RF hmfb.fa hmfb.fa D:DNA_1.fa D:DNA_2.fa

I get one error during the prediction related to PyTorch: /home/schwabs/.local/lib/python3.8/site-packages/e3nn/o3/_spherical_harmonics.py:82: UserWarning: FALLBACK path has been taken inside: compileCudaFusionGroup. This is an indication that codegen Failed for some reason. To debug try disable codegen fallback path via setting the env variable "export PYTORCH_NVFUSER_DISABLE=fallback" To report the issue, try enable logging via setting the envvariable " export PYTORCH_JIT_LOG_LEVEL=manager.cpp" (Triggered internally at ../torch/csrc/jit/codegen/cuda/manager.cpp:244.) sh = _spherical_harmonics(self._lmax, x[..., 0], x[..., 1], x[..., 2])

ramjet10 commented 1 year ago

I get that same error but when I diabled fallback as suggested it was a GPU memory error so just adjust script to force CPU usage and don't get an error when using CPU.

On Fri, Sep 16, 2022, 3:54 PM Samuel Schwab @.***> wrote:

Same problem when predicting the homodimer of P19267 https://legacy.uniprot.org/uniprot/P19267 (which has dimer https://www.rcsb.org/structure/1A7W and DNA binding https://www.rcsb.org/structure/5T5K PDB structures) complexed with DNA dsDNA: [image: RF2NA_P19267_dimer_DNA] https://user-images.githubusercontent.com/86783023/190584386-7acf2512-4808-4756-bbbb-a6dc98099a16.png

I use this command to run the prediction (I have also attached the fasta files: fasta_files_hmfb.zip https://github.com/uw-ipd/RoseTTAFold2NA/files/9581824/fasta_files_hmfb.zip ): ../run_RF2NA.sh hmfb_RF hmfb.fa hmfb.fa D:DNA_1.fa D:DNA_2.fa

I get one error during the prediction related to PyTorch: /home/schwabs/.local/lib/python3.8/site-packages/e3nn/o3/_spherical_harmonics.py:82: UserWarning: FALLBACK path has been taken inside: compileCudaFusionGroup. This is an indication that codegen Failed for some reason. To debug try disable codegen fallback path via setting the env variable "export PYTORCH_NVFUSER_DISABLE=fallback" To report the issue, try enable logging via setting the envvariable " export PYTORCH_JIT_LOG_LEVEL=manager.cpp" (Triggered internally at ../torch/csrc/jit/codegen/cuda/manager.cpp:244.) sh = _spherical_harmonics(self._lmax, x[..., 0], x[..., 1], x[..., 2])

— Reply to this email directly, view it on GitHub https://github.com/uw-ipd/RoseTTAFold2NA/issues/5#issuecomment-1249040286, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDMEOEFT6HLWWI6MIQYJ7LV6QRSXANCNFSM6AAAAAAQN3MZSM . You are receiving this because you authored the thread.Message ID: @.***>

mpacella88 commented 1 year ago

I also see that warning but am able to get around it by running on CPU. I'm curious if it's actually causing a problem with the prediction on GPU. CPU runs of course take an order of magnitude longer.

SamuelSchwab commented 1 year ago

Running on CPU results in the same problem: RF2NA_P19267_dimer_DNA_CPU

fdimaio commented 1 year ago

Hello,

Thank you for the observations. This is something we saw during early training that seemed to go away following fine-tuning, but clearly it did not. I am almost certain we can fix this but it will take some network retraining. I will hopefully update with updated weights next week.

ramjet10 commented 1 year ago

I don't suppose there is a way to input an existing dimer model pdb file with the DNA in the meantime?

mpacella88 commented 1 year ago

Same problem when predicting the homodimer of P19267 (which has dimer and DNA binding PDB structures) complexed with dsDNA: RF2NA_P19267_dimer_DNA

I use this command to run the prediction (I have also attached the fasta files: fasta_files_hmfb.zip): ../run_RF2NA.sh hmfb_RF hmfb.fa hmfb.fa D:DNA_1.fa D:DNA_2.fa

I get one error during the prediction related to PyTorch: /home/schwabs/.local/lib/python3.8/site-packages/e3nn/o3/_spherical_harmonics.py:82: UserWarning: FALLBACK path has been taken inside: compileCudaFusionGroup. This is an indication that codegen Failed for some reason. To debug try disable codegen fallback path via setting the env variable "export PYTORCH_NVFUSER_DISABLE=fallback" To report the issue, try enable logging via setting the envvariable " export PYTORCH_JIT_LOG_LEVEL=manager.cpp" (Triggered internally at ../torch/csrc/jit/codegen/cuda/manager.cpp:244.) sh = _spherical_harmonics(self._lmax, x[..., 0], x[..., 1], x[..., 2])

@fdimaio Do you happen to know if this PyTorch warning indicates a problem with our installation or if this is also something you also encountered? Running on cpu removes the warning but it's obviously much slower. Thanks!

fdimaio commented 1 year ago

@fdimaio Do you happen to know if this PyTorch warning indicates a problem with our installation or if this is also something you also encountered? Running on cpu removes the warning but it's obviously much slower. Thanks!

I will look into this and get back to you.

mf-rug commented 1 year ago

@fdimaio Any updates on this issue? It's currently making my structure predictions impossible unfortunately.

fdimaio commented 1 year ago

@fdimaio Any updates on this issue? It's currently making my structure predictions impossible unfortunately.

A new model is training now where homodimers are explicitly modelled in training. Not sure of the timetable unfortunately.

fdimaio commented 1 year ago

@fdimaio Any updates on this issue? It's currently making my structure predictions impossible unfortunately.

A new model is training now where homodimers are explicitly modelled in training. Not sure of the timetable unfortunately.

I started finetuning today. Hopefully, we will be able to post a branch with fixes by the end of this week.

bifxcore commented 1 year ago

Was this issue fixed? I have the same issue where my protein homodimer is not formed. In my case each monomer appears to be complexed separately with the RNA, and the extensive interface between the protein monomers is lost. The structure of the protein dimer is in the PDB and was released 2016-10-12 , so should be in the templates file.

anar-rzayev commented 1 month ago

I know this issue has been super outdated, but after running with the FASTA files @SamuelSchwab provided, I got the following results: hmfb

I am not very good at recognizing whether the homodimer was formed or not, but yeah, after following the installation setups, the GPU results for pLDDT are as follows:

Running on GPU
           plddt    best
RECYCLE  0   0.675  -1.000
RECYCLE  1   0.730   0.675
RECYCLE  2   0.734   0.730
RECYCLE  3   0.732   0.734
RECYCLE  4   0.736   0.734                                                                                                                                e01" 11:34 05-Aug-24
RECYCLE  5   0.735   0.736
RECYCLE  6   0.729   0.736
RECYCLE  7   0.728   0.736
RECYCLE  8   0.733   0.736
RECYCLE  9   0.735   0.736
Done
anar-rzayev commented 1 month ago

On the other hand, I ran it again and the results were as follows

Running on GPU
           plddt    best
RECYCLE  0   0.640  -1.000
RECYCLE  1   0.720   0.640
RECYCLE  2   0.731   0.720
RECYCLE  3   0.731   0.731
RECYCLE  4   0.732   0.731
RECYCLE  5   0.740   0.732
RECYCLE  6   0.742   0.740
RECYCLE  7   0.739   0.742
RECYCLE  8   0.738   0.742
RECYCLE  9   0.740   0.742
Done

image