oist-ncbc / spykesim

Extended edit similarity measurement for high dimensional discrete-time series signal (e.g., multi-unit spike-train).
https://pypi.org/project/spykesim
MIT License
20 stars 3 forks source link

Add zerospikes_penalty argument #26

Open KeitaW opened 4 years ago

KeitaW commented 4 years ago

This PR introduces a boolean argument to clocal_exp_editsim and clocal_exp_editsim_withbp that indicates whether you want to ignore the non-spiking columns during the edit similarity calculation. See https://github.com/KeitaW/spykesim/issues/23 for the details.

KeitaW commented 4 years ago

Test is somehow failing will fix soon.

milenamc commented 8 months ago

Running on a clean virtualenv based on Python 3.10.2:

Cython==0.29.37
exceptiongroup==1.2.0
h5py==3.10.0
hdbscan==0.8.33
iniconfig==2.0.0
joblib==1.3.2
nose==1.3.7
numpy==1.26.4
packaging==23.2
pandas==2.2.0
pluggy==1.4.0
pytest==8.0.0
python-dateutil==2.8.2
pytz==2024.1
scikit-learn==1.4.0
scipy==1.12.0
six==1.16.0
spykesim==1.3.0
threadpoolctl==3.2.0
tomli==2.0.1
tqdm==4.66.1
tzdata==2023.4

After updating deprecated numpy type (np.int changed to np.int_), Pytest passed with around 1000 warnings. Almost all of them are deprecation warnings related to gensimmat and generate_signature_matrix_cpu_single:

DeprecationWarning: NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays.  The conversion of -8774792623331384397 to uint32 will fail in the future.
  For the old behavior, usually:
      np.array(value).astype(dtype)
  will give the desired result (the cast overflows).
    simmat_lsh = es.gensimmat(self.binmat, window, window, numband=1, bandwidth=50, minhash=True)
DeprecationWarning: NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays.  The conversion of 1364461791113035166 to uint32 will fail in the future.
  For the old behavior, usually:
      np.array(value).astype(dtype)
  will give the desired result (the cast overflows).
    sigmat = minhash.generate_signature_matrix_cpu_single(numhash, numband, bandwidth, self.b)

Which persist even after changing dtype=np.uint32 to .astype(uint32) in minhash.pyx. However, given the high value of the integers in the error messages, I wonder if there is a better way to type cast and/or operate them.


Older local version of spykesim still running on 3.7.3 but with similar modifications as introduced by this branch also passes the tests with 78 warnings.

Considering how Python 3.7 is now unsupported, I believe it would be good to confirm that this library can work on a more recent version by working on the necessary type modifications.

milenamc commented 8 months ago

Reinstalled this branch from scratch, this time using Python 3.7.3.

Cython==0.29.37
exceptiongroup==1.2.0
h5py==3.8.0
hdbscan==0.8.33
importlib-metadata==6.7.0
iniconfig==2.0.0
joblib==1.3.2
nose==1.3.7
numpy==1.21.6
packaging==23.2
pandas==1.3.5
pluggy==1.2.0
pytest==7.4.4
python-dateutil==2.8.2
pytz==2024.1
scikit-learn==1.0.2
scipy==1.7.3
six==1.16.0
spykesim==1.3.0
threadpoolctl==3.1.0
tomli==2.0.1
tqdm==4.66.2
typing-extensions==4.7.1
zipp==3.15.0

After pip installing pandas, nose and hdbscan, pytest passed with a single warning on Linux (pytest-7.4.4, pluggy-1.2.0):

DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    from imp import find_module, load_module, acquire_lock, release_lock

Deprecation warnings and issues related to newer versions of Python and numpy still need to be treated (probably by another pull request), but it seems that the modifications introduced here are working as intended.