First of all congrats with this initiative, so far this seems a great optimization package!
I noticed that Pyspark and Dask are mandatory dependencies of this project. However, since a user is likely to only use one of those, or none at all, it makes the package as a whole unnecessarily heavy. This can be a burden when distributing this package as part of a larger application.
First of all congrats with this initiative, so far this seems a great optimization package!
I noticed that Pyspark and Dask are mandatory dependencies of this project. However, since a user is likely to only use one of those, or none at all, it makes the package as a whole unnecessarily heavy. This can be a burden when distributing this package as part of a larger application.
It might be better to make the various parallelization frameworks optional dependencies. This can be done with the 'extras_require' arg in setup.py (https://setuptools.readthedocs.io/en/latest/setuptools.html#declaring-extras-optional-features-with-their-own-dependencies). It would probably also entail moving some import statements around.