rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.2k stars 525 forks source link

[FEA] Break out cython bindings into `.pxd` file for cuML algorithms #3277

Open divyegala opened 3 years ago

divyegala commented 3 years ago

Right now, we have both cython bindings and cython code in the same .pyx files. This limits readability as some of our cython code has become large, both for development and review purposes. We should separate the bindings into a .pxd file to improve conciseness

dantegd commented 3 years ago

@divyegala this was the way we had it originally and it led to a lot of extra files that had little re-use in all of our folders with very little added benefit, so I would be strongly against going back to that.

I think our current approach of separate pxd files for wrapped functions that get re-used and single .pyx wrapped files for ones that don't is just better from a pragmatical as well as a smaller lines of code and number of files.

Maybe case by case it can be done, for example random forest is quite a codebase, but for things like kmeans, dbscan, linear models and many others this just introduces complexity without any upside I can see right now unless I'm missing something.

divyegala commented 3 years ago

@dantegd We could likely hide the .pxd files in something like decomposition/detail/pca.pxd as taking PCA's example and following C++'s detail convention. This would hide away PCA binding API to libcuml++ which is barely ever updated anyway, while hiding away 100 lines of 731 lines of pca.pyx that currently exist, allowing updates to be specific to the cython code and not having to scroll down the file every time.

In which case, yes, a case-by-case approach could likely suffice for us. Linear models are broken down quite well with a lot of mixins and re-usability and I still see advantage in hiding away API usage, leaving behind very small, modular code.

Also makes it easier for developers and reviewers to identify when there's an API change in the bindings, or just updates to the cython call.

dantegd commented 3 years ago

@dantegd We could likely hide the .pxd files in something like decomposition/detail/pca.pxd as taking PCA's example and following C++'s detail convention. This would hide away PCA binding API to libcuml++ which is barely ever updated anyway, while hiding away 100 lines of 731 lines of pca.pyx that currently exist, allowing updates to be specific to the cython code and not having to scroll down the file every time.

That is exactly what I was referring to when mentioning that this was how things were originally. This duplicates a lot of files if done blindly, which is just changing one trade off for another. In PCA itself I see pretty much no benefit of doing that.

Also makes it easier for developers and reviewers to identify when there's an API change in the bindings, or just updates to the cython call.

I don't see how it makes it any easier at all, it is the exact same change, just showed in two files or one, so I don't see the upside? It is just as evident where the change is, either the cdef extern or on the python call.

github-actions[bot] commented 3 years ago

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.