scikit-learn-contrib / scikit-dimension

A Python package for intrinsic dimension estimation
https://scikit-dimension.readthedocs.io
BSD 3-Clause "New" or "Revised" License
79 stars 16 forks source link

Why is participation ratio rounded to an integer? #12

Closed RylanSchaeffer closed 2 years ago

RylanSchaeffer commented 2 years ago

I just noticed that participation ratio is always an integer. Why is that?

https://github.com/scikit-learn-contrib/scikit-dimension/blob/ea1a4954fdb6ce6e8b36bd9597044cd4040149ab/skdim/id/_PCA.py#L189

j-bac commented 2 years ago

Yes it is rounded. If you think it is more useful or standard not to round I can make the change

RylanSchaeffer commented 2 years ago

I've never heard of it being rounded, nor had two of my labmates. We also tried to find whether anyone recommends rounding it in a paper, but weren't successful.

Overall, it's not a big deal. We were just perplexed why we were getting integer values and thought we were misusing the library. Maybe change it?

j-bac commented 2 years ago

Just updated, thanks for the suggestion. Will release on pypi after some other updates

RylanSchaeffer commented 2 years ago

@j-bac any updates on the new release?

j-bac commented 2 years ago

my bad, I did make a new release but seems it didn't include the change. Just added v0.3.1 on pypi

RylanSchaeffer commented 2 years ago

I'm afraid that change introduced an error. Because de is now now longer an integer, this indexing breaks:

https://github.com/scikit-learn-contrib/scikit-dimension/blob/ea1a4954fdb6ce6e8b36bd9597044cd4040149ab/skdim/id/_PCA.py#L192

RylanSchaeffer commented 2 years ago

The error:

python3.7/site-packages/skdim/id/_PCA.py", line 192, in _participation_ratio
    return de, gaps[de - 1]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
j-bac commented 2 years ago

Thanks, just fixed this (0.3.2). I'm returning all gaps now under self.gap_