mthh / jenkspy

Compute Natural Breaks in Python (Fisher-Jenks algorithm)
https://pypi.python.org/pypi/jenkspy
MIT License
215 stars 28 forks source link

Add JenksNaturalBreaks class for fit and predict #11

Closed yasirroni closed 3 years ago

yasirroni commented 3 years ago

What do you think about adding JenksNaturalBreaks class as a classification method?

If you agreed with this PR, I might help write the test.

Note:

  1. I'm not familiar with C nor Cython.
  2. The from . import jenks return import error on my computer. How does it work when there is no jenks class nor jenks.py?
yasirroni commented 3 years ago

example usage:

x = [0,1,2,3,4,5,6,7,8,9,10,11]
myjenks = JenksNaturalBreaks(nb_class=4)
try:
    print(myjenks.labels_)
    print(myjenks.groups_)
    print(myjenks.inner_breaks_)
except:
    pass
myjenks.fit(x)
try:
    print(myjenks.labels_)
    print(myjenks.groups_)
    print(myjenks.inner_breaks_)
except:
    pass
print(myjenks.predict(15))
print(myjenks.predict([2.5, 3.5, 6.5]))
mthh commented 3 years ago

Thanks for the PR.

Regarding your note, the from . import jenks works because there is a "jenks" shared library (with .so/.dylib/.dll as extension, depending on if your OS is Gnu/Linux, MacOS or Windows) in the same folder. It doesn't appears like so in the source because it is produced when building / installing the library (and because it is already contained in the wheels dowloaded from PyPi when doing pip install jenkspy).

In your dynamic interpreter the dot notation wont works (because you need to be in a package to use it) but once you have built the cython extension (see below) you can go in the jenkspy folder and import it.

To build the extension inplace and produce the file corresponding to the shared library, in the root of this repository, in your shell :

python3 setup.py build_ext --inplace

Then in your python interpreter :

os.chdir('jenkspy') # this is the 'jenkspy' subfolder, not the root of this repo
import jenks # import the 'jenks' shared library produced 'inplace'
jenks._jenks_breaks # The `_jenks_breaks` function exposed from the Cython/C code

Getting back to your PR, it looks good to me and i'm OK to merge it (some users might be more comfortable with the interface you propose, what seems closer to scikit-learn if I'm not mistaken ?)

Could you also :

Once it's done I would happily merge it ! Thanks again !

yasirroni commented 3 years ago

To build the extension inplace and produce the file corresponding to the shared library, in the root of this repository, in your shell :

python3 setup.py build_ext --inplace

It returns error in my VS Code:

error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads/

I'll write the test and test it using a blindfold (without running the code) then, or try a workaround such as directly modifying the library.

mthh commented 3 years ago

I haven't been using windows recently so I don't remember exactly how to create a dev environment to compile Cython extensions, sorry !

Did you try to install the components requested by the error message? There seems to be some messages on the web about this kind of problems (for example https://stackoverflow.com/questions/47253638/error-microsoft-visual-c-14-0-is-required-when-installing-python-package or https://groups.google.com/g/cython-users/c/qujm95cfWRc?pli=1) and there is also some more general info. about compiling Cython extensions on windows (https://github.com/cython/cython/wiki/CythonExtensionsOnWindows).

Regarding your tests, note that you can benefit from the testing done by the Continuous Integration tooling that is in place (I guess both AppVeyor and Travis are triggered by the pull requests, even if I don't currently see the travis green check under this PR).

yasirroni commented 3 years ago

Did you try to install the components requested by the error message? There seems to be some messages on the web about this kind of problems (for example https://stackoverflow.com/questions/47253638/error-microsoft-visual-c-14-0-is-required-when-installing-python-package or https://groups.google.com/g/cython-users/c/qujm95cfWRc?pli=1) and there is also some more general info. about compiling Cython extensions on windows (https://github.com/cython/cython/wiki/CythonExtensionsOnWindows).

I ended up just modify the installed library and test it from there. It passes my test on my local computer. Hope it's also passes yours.

yasirroni commented 3 years ago

I don't know why, but whenever I PR (this is the second time) on readme.rst, it will fail the CI. Do you know why? I will git reset this PR and make the readme update and the test update separated for the mean time.

yasirroni commented 3 years ago

ERROR: test_warnings (main.JenksClassTestCase)

Traceback (most recent call last): File "tests\test_jenks.py", line 26, in setUp self.res5 = np.array([0, 2, 0, 3, 3, 3, 0, 2, 2, 0, 2, 0, 3, 1, 0, 3, 0, 2, 1, 0, 3, 0, 1, 1, 2, 0, 1, 3, 3, 0, 2, 0, 2, 1, 2, 0, 0, 0, 0, 0, 1, 3, 0, 0, 3, 2, 1, 3, 1, 0]) AttributeError: 'NoneType' object has no attribute 'array'

It happens because of the test environment:

try:
    import numpy as np
except ImportError:
    np = None
yasirroni commented 3 years ago

ERROR: test_warnings (main.JenksClassTestCase)

Traceback (most recent call last): File "tests\test_jenks.py", line 26, in setUp self.res5 = np.array([0, 2, 0, 3, 3, 3, 0, 2, 2, 0, 2, 0, 3, 1, 0, 3, 0, 2, 1, 0, 3, 0, 1, 1, 2, 0, 1, 3, 3, 0, 2, 0, 2, 1, 2, 0, 0, 0, 0, 0, 1, 3, 0, 0, 3, 2, 1, 3, 1, 0]) AttributeError: 'NoneType' object has no attribute 'array'

It happens because of the test environment:

try:
  import numpy as np
except ImportError:
  np = None

By default, the wrapper interface needs NumPy library, but it seems that jenkspy didn't need NumPy. Should we add NumPy dependency or should we skip all tests related to NumPy array (use your if np style)?

Alternatively, we can make all output of the interface in Python built-in data type, such as a list. But, it will make a different behaviour compared to scikit-learn.

mthh commented 3 years ago

Sorry for the delay in responding. You are right, these errors are due to the absence of numpy in the test environment.

Until now, numpy was an optional dependency of "jenkspy". I would have liked to keep it that way (because jenkspy's core functionality doesn't require numpy, which is an heavy dependency).

Do you know if there is a way to expose your class to users only if they have numpy installed? I'm going to finvestigate / test some stuff in that direction this weekend.

yasirroni commented 3 years ago

So, you mean like:

  1. If have numpy, can use JenksNaturalBreaks class

  2. If not, only able to use the core functions

_ The solutions is either make it as different package or return error message when they don't have numpy but want to use the class interface.

Or declare to the user that it behaves differently from scikit-learn and return list as output.

On Sat, 17 Oct 2020, 21:33 Matthieu Viry, notifications@github.com wrote:

Sorry for the delay in responding. You are right, these errors are due to the absence of numpy in the test environment.

Until now, numpy was an optional dependency of "jenkspy". I would have liked to keep it that way (because jenkspy's core functionality doesn't require numpy which is an important dependency).

Do you know if there is a way to expose your class to users only if they have numpy installed? I'm going to find out / test some stuff this weekend.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mthh/jenkspy/pull/11#issuecomment-711021147, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALTUAKD3BZ5YVXBPHEEQOMDSLGTJZANCNFSM4SLFU2BQ .

mthh commented 3 years ago

Yes that's what I meant.

I would be completely fine with your solution "returning an error message if the user don't have numpy but try to use the JenksNaturalBreaks class", but since this is your contribution, if you find it too annoying and prefer to create a separate package, don't hesitate.

Sorry I didn't notice the inclusion of numpy (or especially the fact that it's an optional dependency of jenkspy) immediately upon the opening of your PR.

yasirroni commented 3 years ago

Ready to merge, Sir!

mthh commented 3 years ago

Many thanks ! I will publish a new version en PyPI today.

mthh commented 3 years ago

Released as 0.2.0.