Closed 545487677 closed 4 months ago
Thank you for raising the issue! The compute_score
function is generally one of the most time-consuming parts of the generative process since it's the part that actually involves running a machine learning model to predict the score of the proposed molecule. This can involve both computing molecular features (RDKit fingerprints) and running the ML model (either a random forest or a Chemprop graph neural network). On a per molecule basis, it should only take maybe 1-2 seconds, but given the large number of molecules that are proposed and scored, the time does add up.
To answer your questions:
Out of curiosity, what model type are you using and how long is the compute_score
method taking? I also want to make sure it's not taking an unreasonably long time due to a different issue.
Thank you for your information!!! I used the chemprop to generate molecules for around 10 hours.
Okay that sounds about right in terms of the timing. Please let me know if you have any other questions!
Hi, Great work on implementing this method!!! The logic and structure are very clear and concise. However, I have encountered significant performance issues with the compute_score method in our code. The computation of scores for molecules is taking long time.
@classmethod def compute_score(cls, molecules: tuple[str], scoring_fn: Callable[[str], float]) -> float: """Computes the score of the molecules.
@cached_property def P(self) -> float: """The property score of this Node. (Note: The value is cached, so it assumes the Node is immutable.)""" return self.compute_score(molecules=self.molecules, scoring_fn=self.scoring_fn)
Problem: The execution time for the compute_score method is very long, especially when calculating scores for a large number of molecules.
Questions: