Open PatWalters opened 1 year ago
Hi @PatWalters. Many thanks for identifying the problematic descriptors!
While I believe these molecules are for sure outside of the applicability domain, I feel that is up to the user to be conscious of this rather than us returning nan
values.
I'll go check whether removing these descriptors impacts performance in any significant way and remove them from the default featurizer/model if it is not the case.
Hi @josejimenezluna. Could you please provide an update on the effects of removing these specific descriptors from the default featurizer/model? I am interested in knowing if there have been any recent findings or observations regarding the impact this has on the performance of the analysis or model.
This is a follow-up to the previous issue I reported. It turns out that the problem isn't with molecules that have multiple fragments. The problem is that a few of the RDKit descriptors return nan when they encounter atom types that are not parameterized. The impacted descriptors are listed below. I'm willing to bet you could remove these from the descriptors you're currently using without impacting performance. Then again, these molecules are probably outside your applicability domain, and MolSkillScorer.score should return nan.
Here are the problematic descriptors
BCUT2D_MWHI BCUT2D_MWLOW BCUT2D_CHGHI BCUT2D_CHGLO BCUT2D_LOGPHI BCUT2D_LOGPLOW BCUT2D_MRHI BCUT2D_MRLOW MaxPartialCharge MinPartialCharge MaxAbsPartialCharge MinAbsPartialCharge