graspologic-org / graspologic

Python package for graph statistics
https://graspologic-org.github.io/graspologic/
MIT License
776 stars 143 forks source link

POT & hyppo Dependencies Discussion #563

Closed alyakin314 closed 3 years ago

alyakin314 commented 3 years ago

@jovo suggested that we make the two of our required dependencies that aren't a part of the general data science stack, namely hyppo and POT, optional. the idea here is that hyppo is only used in a single class (latent distribution test), whereas POT is only employed in seedless procrustes, and the only other class that imports it (funny enough, also latent distribution test). an optional dependency would be something that is not installed when one pip installs graspologic. instead we would check for it when trying to import one of these models, and, possibly via a try-except raise a warning asking to install it. this is up for discussion, because there are upsides and downsides of doing so.

there is also a different route we can take of just removing it completely:

for POT, the only reason we import it is to solve one linear assignment problem, using one algorithm (i have considered trying to use other solvers, but never got to it). it is a rather simpler algorithm that can be implemented in several lines of code. i didn't look at POT too closely, but i was under the impression that calls cython, so it should be faster than a python implementation that i had before. i believe the difference in profiler was ~5-10%, but i was using small graphs, and maybe it would scale better. but maybe this is not worth having another dependency.

for hyppo, the idea is that we could try to pr all we need to scipy. in particular, the only three hyppo tests we really are using with LDT are hsic, dcorr and mgc. the latter is in scipy already, and if we can pr the other two, especially their 2-sample versions - we can drop hyppo's dependecy.

relevant cc's: @bdpedigo , @j1c , @dwaynepryce , @sampan501

alyakin314 commented 3 years ago

personal opinion: optional dependencies always frustrated me in other packages, because after i installed and imported the package - i don't want to be going back to terminal, and install more things. so, i'd prefer to avoid them, especially, since there aren't any other issues with those, like memory or compatibility concerns. happy to discuss the route to avoid them at all.

alyakin314 commented 3 years ago

Current consensus, as i understood it: do not do optional dependencies, but do consider ditching them:

bdpedigo commented 3 years ago

note that more dependencies will be added soon

bdpedigo commented 3 years ago

note: we now also rely on

daxpryce commented 3 years ago

surprisingly I think just umap-learn, for now - graspologic-native, but that's just our rust native module so it won't have any transitive deps of its own (which is good)

bdpedigo commented 3 years ago

closing cause this doesnt seem a debate at this point, happy to repopen if someone disagrees