hsmaan / balanced-clustering

Reworked clustering metrics for assessing performance in imbalanced settings
MIT License
14 stars 3 forks source link

Cleanup package for dependency in scib-metrics #5

Open adamgayoso opened 2 years ago

adamgayoso commented 2 years ago

Hello, I'm interested in using these metrics in the package we are building:

https://github.com/YosefLab/scib-metrics

which contains reimplementations of the scib metrics mostly using jax. I think it would be cool to include the balanced ari/ami here, but this package currently seems to contain superfluous and restricted dependencies (e.g., jupyterlab, and tight restrictions on numpy etc). Furthermore, there is some cython code that can make things complicated. However, it does look straightforward to reimplement that one function using numba for example.

Is there any interest in including these two changes (removing unnecessary dependencies and porting to use numba) here?

adamgayoso commented 2 years ago

If you wanted, you could also refactor this package to use https://github.com/scverse/cookiecutter-scverse

hsmaan commented 2 years ago

Hi @adamgayoso, I'm glad to see that your team has an interest in adding these metrics. We'd be happy to refactor the code to try to include the balanced ARI and AMI in scib-metrics.

I've removed the restricted and unnecessary dependencies, as well as the notebooks which use them in the scib_metrics branch. For the cython code, I'm a little less sure as I do not have experience with numba. If I can work with a member of your team on that aspect, that would help move things along a bit quicker. Otherwise, I can look into things and try to port it.

hsmaan commented 2 years ago

Reimplementation for cython code in numba and cleanup is complete. Testing numerical outputs and comparison between cython and numba implementations to be completed next, and then should be ready to add to scib_metrics.

adamgayoso commented 2 years ago

Great! It might also be nice if you ran the black python formatter on the codebase, but this is a bit nitpicky :)

Also adding all the functions to the readme!

hsmaan commented 2 years ago

Not nitpicky at all! Done using black/flake8 (kept max line length at 88).

For the functions, do you mean adding examples of the other functions to the README? These are currently in the first notebook. Not sure what sort of structure you had in mind, but can definitely move some to the README.

adamgayoso commented 2 years ago

Yes examples at a minimum.

If you wanted to take things further you could restructure the repo by using this:

https://github.com/scverse/cookiecutter-scverse

But I think what you have here is fine; at a minimum I would use pre-commit

https://github.com/scverse/scvi-tools/blob/main/.pre-commit-config.yaml

hsmaan commented 1 year ago

In scib_metrics_num_stab_testing, the comparison between numba and cython implementations shows all tests are passing.

hsmaan commented 1 year ago

Readme updated with full example + added return_results function in main

hsmaan commented 1 year ago

Hi @adamgayoso , apologies for the long wait but I think things should be good to go now. I've opened up a PR for the scib_metrics implementation - comparison between numba and cython implementation checks out (#7). I've added pre-commit and happy to pull some more things from the template down the line, but figured this might be good enough for now. Let me know, thanks!