manuel-calzolari / sklearn-genetic

Genetic feature selection module for scikit-learn
https://sklearn-genetic.readthedocs.io
GNU Lesser General Public License v3.0
324 stars 77 forks source link

Fixing decorator error and doing some basic cleanup #41

Closed detrin closed 1 year ago

detrin commented 1 year ago
  1. Commits are not squashed.
  2. I rearranged the tests and added your example from docs.
  3. Removed the decorators and instead of that I check the methods directly in wrapper methods.
  4. I fixed parsing of version.
  5. Formatted with black.
  6. Fixed the np.bool change since numpy==1.20.0.
detrin commented 1 year ago

@manuel-calzolari Could you have a look at this PR please?

manuel-calzolari commented 1 year ago

Hi @detrin , thanks for the PR, I will have time to review it over the weekend. However, I can already say that the change in step 4 (the change to version parsing) is not okay, as there was a specific reason why I had done it that way.

detrin commented 1 year ago

Hi @detrin , thanks for the PR, I will have time to review it over the weekend. However, I can already say that the change in step 4 (the change to version parsing) is not okay, as there was a specific reason why I had done it that way.

Okay, could you please explain the reason behind that?

Please when you have time leave me list of things you want to change or change them yourself.

manuel-calzolari commented 1 year ago

Hi @detrin, first of all thank you for your contribution!

Okay, could you please explain the reason behind that?

The problem is that if you import __version__ using the Python import, the GeneticSelectionCV class is also imported through the line from .gscv import GeneticSelectionCV found in the __init__.py file. By importing that class, the imports in the gscv.py file are in turn executed. Since the latter imports reference libraries such as sklearn and deap, in practice installing the package through the setup.py file would fail if all the dependencies had not already been installed in the environment (the version is read before installing the requirements). To get around this problem there may be different solutions and one of the ways usually used (see for example hyperopt https://github.com/hyperopt/hyperopt/blob/master/setup.py#L6 or other libraries) is to parse the __init__.py file with a regex instead of using the Python import.

About the compatibility issues with recent versions of sklearn and numpy, I fixed them with commit 12ee9b2. Compared to the changes proposed in this PR, I preferred to adopt slightly different solutions. For the problem with numpy, I preferred to change the type to bool instead of np.bool_ to remain consistent with the change applied to deap (the library I use for the genetic algorithm), see for example https://github.com/DEAP/deap/commit/0453060f6d9fce4e712f0f3479733118b2655bb4. As for sklearn, I preferred to adopt the modification applied to scikit-learn itself, see https://github.com/scikit-learn/scikit-learn/commit/f12987de98152659f9279b635ccf9c150597274a (this necessitated updating the minimum required version of sklearn, but it seems reasonable).

Formatting the code with black is definitely a good idea (the code currently has some inconsistencies inherited from changes in the conventions used by scikit-learn over the years, such as in the use of single or double quotes), however, at this point it makes sense to do so upon completion of other changes I need to make to the library.

detrin commented 1 year ago

The problem is that if you import __version__ using the Python import, the GeneticSelectionCV class is also imported through the line from .gscv import GeneticSelectionCV found in the __init__.py file. By importing that class, the imports in the gscv.py file are in turn executed. Since the latter imports reference libraries such as sklearn and deap, in practice installing the package through the setup.py file would fail if all the dependencies had not already been installed in the environment (the version is read before installing the requirements). To get around this problem there may be different solutions and one of the ways usually used (see for example hyperopt https://github.com/hyperopt/hyperopt/blob/master/setup.py#L6 or other libraries) is to parse the __init__.py file with a regex instead of using the Python import.

Thanks, I have learned something new. I would assume that the dependencies are already installed is desired state.

About the compatibility issues with recent versions of sklearn and numpy, I fixed them with commit 12ee9b2. Compared to the changes proposed in this PR, I preferred to adopt slightly different solutions. For the problem with numpy, I preferred to change the type to bool instead of np.bool_ to remain consistent with the change applied to deap (the library I use for the genetic algorithm), see for example DEAP/deap@0453060. As for sklearn, I preferred to adopt the modification applied to scikit-learn itself, see scikit-learn/scikit-learn@f12987d (this necessitated updating the minimum required version of sklearn, but it seems reasonable).

Regarding the decorator from scikit-learn, I don't see how this would be good idea in general. I would guess that similar issue will arise again in year or more as the API might change again for it. I understand the rest, on the other hand I feel like the effort I put into this PR was for nothing.