Open lykos98 opened 6 months ago
Wow!! Thank you @lykos98, I had a look at your dadaC repo, at the benchmarks, and at you your PR. It all seems excellent work to me. I was considering asking you to contribute the entire dadaC codebase in dadapy, but probably is better to keep them separate as you originally designed. I would ask you to make the tests pass, then I will have a closer look as soon as I have time before merging but it should not take much
Goodafternoon, I fixed the issue on the tests and run them in my fork, everything seems fine. The issue was linked to a leftover, I overlooked in the diff.
I think that ideally the codebase of dadac should be included but when I tried to perform that kind of integration it was not easy to deal with the inclusion of the .so file needed for dadac to run. In the next future moreover, I think I will also ship dadac as a proper package with precompiled binaries.
Thank you! F
Attention: Patch coverage is 90.74074%
with 5 lines
in your changes are missing coverage. Please review.
Project coverage is 81.17%. Comparing base (
f0b3fdd
) to head (d065ae1
).
Files | Patch % | Lines |
---|---|---|
dadapy/clustering.py | 90.74% | 5 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Proposed changes
Optimization of ADP procedure and explicit parallelization of many tasks through openmp.
Types of changes
In the last year (or so) I have been working on my implementation of ADP due to the fact the original one, provided in dadapy was not able to scale well for my use case. Together with the help of @alexdepremia and @lucatornatore I developed a very small library in C (dadaC) which implemented the ADP procedure with the following optimizations:
The implementation has been tested against the one in dadapy and preserves a 1 to 1 match of the results in terms of cluster labels, border indices and densities. A detailed discussion, tests, and benchmarks can be found in the repository of my implementation https://github.com/lykos98/dadaC .
Integration in the library requires minimal modifications of the code to allow the implementation in C to interface with yours, actually it requires to add dadaC as dependency, since dadaC requires to build a small .so file that is then used by an interface in python built with ctypes. I kindly ask you if this can be in line with your development, and I am open to modify my implementation to comply with your conventions if needed.
Thank you for your attention, let me know your opinion. F