yoshida-lab / XenonPy

XenonPy is a Python Software for Materials Informatics
http://xenonpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
135 stars 57 forks source link

SMC data frame compatibility and update Fingerprints #217

Closed stewu5 closed 3 years ago

stewu5 commented 3 years ago

close #166 close #177

Add pd.DataFrame compatibility to all classes. Add three new fingerprints. Add counting type support to fingerprints. Fix fp_size to n_bits in RDKitFP. Update labels of DescriptorFeature to the actual rdkit feature names.

codecov[bot] commented 3 years ago

Codecov Report

Merging #217 (47fbe20) into master (f5c5090) will decrease coverage by 0.29%. The diff coverage is 23.82%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #217      +/-   ##
==========================================
- Coverage   34.22%   33.93%   -0.30%     
==========================================
  Files          95       96       +1     
  Lines        6915     7154     +239     
==========================================
+ Hits         2367     2428      +61     
- Misses       4548     4726     +178     
Impacted Files Coverage Δ
xenonpy/datatools/splitter.py 0.00% <0.00%> (ø)
xenonpy/descriptor/base.py 0.00% <0.00%> (ø)
xenonpy/descriptor/compositions.py 0.00% <0.00%> (ø)
xenonpy/descriptor/fingerprint.py 0.00% <0.00%> (ø)
xenonpy/descriptor/frozen_featurizer.py 0.00% <0.00%> (ø)
xenonpy/descriptor/structure.py 0.00% <0.00%> (ø)
xenonpy/inverse/base.py 0.00% <0.00%> (ø)
xenonpy/inverse/iqspr/__init__.py 0.00% <0.00%> (ø)
xenonpy/inverse/iqspr/iqspr.py 0.00% <0.00%> (ø)
xenonpy/inverse/iqspr/iqspr4df.py 0.00% <0.00%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f5c5090...47fbe20. Read the comment docs.

stewu5 commented 3 years ago
  1. In the BaseFeaturizer.transform, I think we'd better add a checker in case not 0 to check if the input type is a dataframe when the featurizer use the parallel we provided.

I agree. If you want me to do it, can you open a new issue after merging this branch to the master?

  1. You renamed the tests/datatools/test_splitter.py to test_data_select.py. Because some fixes has been added in this file recently, the rename lead to a conflict. Please name back this file, if you still want to rename it, we can do it after the merging.

I did not rename it. Maybe it was you? I did not touch that part at all (why would I?).

  1. The BaseDescriptor has many modifications, We will face some new errors for a while 😭

Is there any error? I think the change of BaseDescriptor is not so major. Let me know what are you concerns, or you can leave new issues first and I will review it.

  1. You add a new class BaseLogLikelihoodSet based on the BaseDescriptor, I think the codes are OK, let give more attention to it when we begin to use it

This is not new. This is from the last update. I only updated this class the same way I updated BaseDescriptor.

  1. Delete any file under tests/model_dir/. These files are generated by the unit test and should be deleted automatically when the test is over. I think the reason why they were left here is you ran some test in PyCharm with just a click, the post clean script doesn't work in that case.

Do I have to do it?

  1. Seems lack of test #217 (comment)

Hum... I saw that the 'custom' choice of return_output in BaseFeaturizer is missing some tests. But other than that, the tests are kind of implicitly done. Maybe I will add more tests later in v0.5.1, but do you think it is ok to leave it as is? The overall change is only -0.30%?

TsumiNa commented 3 years ago

I did not rename it. Maybe it was you? I did not touch that part at all (why would I?).

Oh! I remembered that, yes I did. So please rename this file to test_spliter.py. I think that will resolve the conflict.

Is there any error? I think the change of BaseDescriptor is not so major. Let me know what are you concerns, or you can leave new issues first and I will review it.

No, for now, this is OK. I said that just based on my experiences, ok for now, error in the future, always 😺

This is not new. This is from the last update. I only updated this class the same way I updated BaseDescriptor.

Ok, that's great!

  1. Delete any file under tests/model_dir/. These files are generated by the unit test and should be deleted automatically when the test is over. I think the reason why they were left here is you ran some test in PyCharm with just a click, the post clean script doesn't work in that case.

Do I have to do it?

Yes, you have to. files under tests/model_dir/not part of the source code, but has been added to git and committed. This means git will trace these files all the time, but it's not necessary. Please use git rm -r --cached tests/model_dir/ to untrack them, and delete the files on your machine, then make a new commit for the push. This link is a simple detail about git rm cache

  1. Seems lack of test #217 (comment)

Hum... I saw that the 'custom' choice of return_output in BaseFeaturizer is missing some tests. But other than that, the tests are kind of implicitly done. Maybe I will add more tests later in v0.5.1, but do you think it is ok to leave it as is? The overall change is only -0.30%?

Yes the overall changes always small because we already have lots of tests. That is not really a critical problem, just for memory.

  1. In the BaseFeaturizer.transform, I think we'd better add a checker in case not 0 to check if the input type is a dataframe when the featurizer use the parallel we provided.

I agree. If you want me to do it, can you open a new issue after merging this branch to the master?

Ok, I will open a new issue for that.