mordred-descriptor / mordred

a molecular descriptor calculator
http://mordred-descriptor.github.io/documentation/master/
BSD 3-Clause "New" or "Revised" License
355 stars 95 forks source link

Add FractionCSP3, LogS and HeteroAtomsCount #28

Closed WhisperWind22 closed 6 years ago

WhisperWind22 commented 6 years ago

FractionCSP3 and LogS

WhisperWind22 commented 6 years ago

I tried to correct all the comments. Thanks for review.

WhisperWind22 commented 6 years ago

Hello! Still need some changes?

philopon commented 6 years ago

I'm sorry for my late reply :cry: Your PR is almost perfect!

Could you tell me the source of test example of LogS (article, book, or other software etc.) ?

I'll merge your PR after I fix CI.

WhisperWind22 commented 6 years ago

I calculated through http://www.swissadme.ch/index.php

philopon commented 6 years ago

I'm sorry for my late reply. I investigated LogS algorithms. ClogP is used In original ESOL algorithm and XLogP3 is used in alternative version of ESOL. Your LogS code seems incomplete to calculate XLogP3 (XLogP has 83 atom types and 6 extra features; if you have the article of your alglrithm, please let me know). So, we cannot merge your current LogS code. However, I think FractionCSP3 and HeteroAtomsCount is mergeable.

Could you delete LogS code from this PR? (if you accept it, I'll merge this PR soon)

I understand the importance of LogP and LogS descriptor, thus I think to implement XLogP3 (or other LogP algorithm and LogS algorithm) in future.

philopon commented 6 years ago

I read ESOL article [1] (it says original ESOL uses CLogP), SwissADME article [2] (it says ESOL in SwissADME uses XLogP3), and XLogP3 article [3] (it says XLogP3 uses 83 atom types and 6 extra features).

[1] https://doi.org/10.1021/ci034243x [2] https://www.nature.com/articles/srep42717 [3] https://doi.org/10.1021/ci700257y

WhisperWind22 commented 6 years ago

The formula for calculating LogS is taken from the silicos-it (filter-it) library: http://silicos-it.be.s3-website-eu-west-1.amazonaws.com/software/filter-it/1.0.2/filter-it.html#logs

You could also check http://www.swissadme.ch - they by default use filter-it library. So the code that I implemented is based on the source code of the filter-it library. So the results of swissadme output have to be the same as in my code.

philopon commented 6 years ago

I got it :smile:

After some fixing, all values in mordred test molecules are matched to values of filter-it program (25198cb9f12a5959e62e0a52a90c0ae90df37780). Now, values of SwissADME is not reproduced. however, I think this implementation is suitable:

  1. Filter-it is referred by SwissADME
  2. From numerical example:

benzene -0.22435 ([ch1]) * 6 - sqrt(78.114) + 0.89823 = -1.3643

In SwissADME, it seems that [ch0] * 6 are matched (it can reproduce when SMARTs matching to molecules that have explicit hydrogens in RDKit).

And, I think many LogS prediction models are reported. Thus, current name is too general. Could I rename LogS to FilterItLogS (00544fb19062c5a15945bd95f499fb827204d64c)? I'll merge soon when you allow it.

WhisperWind22 commented 6 years ago

Yes, of course.

WhisperWind22 commented 6 years ago

I added 2 more descriptors. One for Plane of Best Fit (https://github.com/rdkit/rdkit/tree/master/Contrib/PBF, https://pubs.acs.org/doi/abs/10.1021/ci300293f), the other for calculating the ratio acyclic to cyclic SP3.

philopon commented 6 years ago

I am sorry to have kept you waiting. FilterItLogS, FractionCSP3 and nHetero descriptors were merged into master.

your new commit a75c9aba62e4c4de3804e148d3454665154a42d5 seems good. However, it is not merged in this PR, sorry... Would you mind split PR SP3A2C and PBF?

Splitting PR per the kind of descriptor makes review easy.

Cheers,