Subject: Performance Issues with Virtual Screening Algorithm

Dear Xujun,

I wanted to express my gratitude for the significant contributions your team has made to the development of virtual screening algorithms. Congratulations on the achievements you have earned in this field.

I am writing to seek your assistance with a performance issue I have encountered while attempting to use the program. It appears that the program is running at a relatively slow pace, with Demo2 taking approximately 20 minutes to complete. Notably, I have observed low CPU, GPU, and RAM utilization during the execution, with occasional SSD write activities being the primary bottleneck.

Here are the details of my system configuration:

CPU: Intel Core i7-13700K GPU: NVIDIA GeForce RTX 4070 Operating System: Windows Subsystem for Linux 2 (WSL2)

I am curious whether there might be any software configurations or settings that could be causing this performance problem. I have explored various avenues to optimize the execution but have yet to find a satisfactory solution.

Your expertise in this field would be greatly appreciated, and any insights or guidance you could provide to help address this performance issue would be invaluable.

Thank you for your time and consideration. I look forward to your response.

Sincerely, Xinci Shang

Dear Xinci,

Thank you for reaching out with your concerns about KarmaDock's performance. Your feedback is invaluable to us. To address your concerns:

KarmaDock offers two operational workflows for virtual screening. The Demo2 on GitHub is scripted for users' convenience. It generates graphs online and performs molecular docking based on these graphs. While this approach minimizes the time required for reading and writing graphs, it encompasses the time taken for graph generation (CPU-intensive) and molecular docking (GPU-intensive). Depending on your specific system configuration, it might be necessary to adjust parameters to ensure a balanced workload across reading/writing operations, CPU, and GPU.
A more practical approach is to initially use the CPU to generate molecular graphs (excluding proteins). This step is relatively quick. Once the graph generation is complete, the GPU can be used for model inference (molecular docking). This method enhances both CPU and GPU utilization, significantly reducing overall processing time.

I will update the code to provide a workflow for this second method. Please keep an eye out for these updates in the coming days.

Thank you for your patience and understanding.

Warm regards,

Xujun

Dear Xujun,

I hope this message finds you well. I would like to express my sincere gratitude for your continued support and your patient responses.

Since my previous inquiry, I have conducted further experiments and made some interesting observations. On Ubuntu 22.04, the docking speed surpasses that on WSL, with each molecule taking approximately 0.09 seconds to process. Upon examining resource utilization, I noticed that one CPU process consistently occupies 100% of the CPU, while other processes remain idle. Furthermore, the GPU is utilized for less than half of the time.

To address this, I created a Python script for task distribution using multiple threads. Running four docking programs in parallel has resulted in a GPU usage close to 100%. Consequently, the docking speed now stands at approximately 0.02 seconds per molecule, which closely aligns with the results reported in your paper. It appears that optimizing the multi-threaded processing capabilities during the generation of molecular graphs on the CPU is the key to further enhancement.

Additionally, during large-scale screening, the program consistently outputs all generated pose files, leading to unnecessary disk space consumption and wear. Is it possible to configure the program to save only the top percentage of results to the hard drive? If so, it would make large-scale and reliable docking screenings much easier.

I have also encountered occasional error messages during runtime, such as "Molecule does not have explicit Hs. Consider calling AddHs()." These errors seem to originate from RDKit. Does this error affect the docking process?

Once again, I want to express my sincere appreciation for your valuable work and your patient responses. Your contributions to the field are truly remarkable.

Best regards, Xinci

Dear Xujun,

I hope this message finds you well. I would like to express my sincere gratitude for your continued support and your patient responses.

Since my previous inquiry, I have conducted further experiments and made some interesting observations. On Ubuntu 22.04, the docking speed surpasses that on WSL, with each molecule taking approximately 0.09 seconds to process. Upon examining resource utilization, I noticed that one CPU process consistently occupies 100% of the CPU, while other processes remain idle. Furthermore, the GPU is utilized for less than half of the time.

To address this, I created a Python script for task distribution using multiple threads. Running four docking programs in parallel has resulted in a GPU usage close to 100%. Consequently, the docking speed now stands at approximately 0.02 seconds per molecule, which closely aligns with the results reported in your paper. It appears that optimizing the multi-threaded processing capabilities during the generation of molecular graphs on the CPU is the key to further enhancement.

Additionally, during large-scale screening, the program consistently outputs all generated pose files, leading to unnecessary disk space consumption and wear. Is it possible to configure the program to save only the top percentage of results to the hard drive? If so, it would make large-scale and reliable docking screenings much easier.

I have also encountered occasional error messages during runtime, such as "Molecule does not have explicit Hs. Consider calling AddHs()." These errors seem to originate from RDKit. Does this error affect the docking process?

Once again, I want to express my sincere appreciation for your valuable work and your patient responses. Your contributions to the field are truly remarkable.

Best regards, Xinci

Dear Xinci,

Thank you for reaching out and for your detailed observations and feedback regarding the karmadock's performance.

Your exploration and work to optimize karmadock's docking speed are highly appreciated. It's through contributions and feedback like yours that we can make continuous improvements to the system.
As you highlighted the importance of optimizing multi-threaded processing capabilities, I'd like to mention that we have introduced a second workflow for virtual screening. This workflow first generates molecular graphs on the CPU and subsequently performs docking and scoring on the GPU, which should streamline the process further.
Addressing your concern about disk space consumption, we have rectified the issue of always generating all pose files. Users can now control which molecules get saved to the disk by providing a score_threshold. Only those molecules with a karma_score above this threshold will be saved, minimizing unnecessary disk wear.
Regarding the error messages you encountered, such as "Molecule does not have explicit Hs. Consider calling AddHs()", these can indeed cause conformations not to be saved. It's a good practice to add hydrogen atoms to conformations before saving them. We have fixed this issue in our latest update, and you shouldn't encounter it moving forward.

Best regards,

Xujun

schrojunzhang / KarmaDock

Subject: Performance Issues with Virtual Screening Algorithm #3