Open MKCarter opened 1 week ago
Hello Mike,
Thank you for your interest and feedback on our project.
Regarding the dependencies, they may vary depending on the running environment. The code was trained and tested within a Docker container using image: nvcr.io/nvidia/pytorch:23.12-py3
. The installation error you encountered might be related to some version mismatch issues. I will check this further.
As for the dummy water position, it is mainly used to ensure successful pre-loading of the protein structure and doesn't affect inference. For reference, I used the following formats for simplicity:
<pdb_id>_water.pdb
:
HETATM 1 O HOH A 1 0.000 0.000 0.000 1.00 0.00 O
TER 2 HOH A 1
END
<pdb_id>_water.mol2
:
@<TRIPOS>MOLECULE
../Superwater/case_study/5F1K/5F1K_water.pdb
1 0 0 0 0
SMALL
GASTEIGER
@<TRIPOS>ATOM
1 O 0.0000 0.0000 0.0000 O.3 1 HOH1 0.0000
@<TRIPOS>BOND
Thank you again for your interest. Please let me know if you have any further questions.
I see, thanks for the heads up on the water files. I have re-run with your suggestion and it doesn't affect the output, which is good to know for future runs.
Yeah, for me, installing rdkit using pip install rdkit-pypi
installs rdkit-pypi 2022.9.5
- which is likely using an old numpy version. Reinstalling with pip install rdkit
installs rdkit 2024.3.6
which seems to work just fine.
Thanks,
Mike
Hi,
Thanks for this code, it is very interesting.
To install and run I had to make a few modifications to the install instructions:
Firstly install torch and other torch packages along with additional requirements:
My requirements.txt looks like this:
I had to modify rdkit-pypi to rdkit - as rdkit-pypi will throw the following error:
Updating to rdkit resolves this.
Once the install was correct, I managed to run an example:
In terms of the dummy water positions, I used the output from a GalaxyWater-CNN calculation.
I imagine this could be intergrated into SuperWater if producing initial water sites is an ongoing issue. At the moment I have this in a seperate conda env - to install, follow the instructions below:
Then to run, you can run something like this:
python GWCNN_gpu.py input.pdb output
I did also check the memory requirements for running on my machine, and it seems to use around 10GB of GPU memory for a protein with around 300 residues.
The predicions were very fast, less than 1 minute runtime.
I hope your exams go well, and thanks for sharing this code! Thanks, Mike