pyjanitor-devs / pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor
https://pyjanitor-devs.github.io/pyjanitor
MIT License
1.34k stars 166 forks source link

Huge Requirements.txt for pyjanitor #793

Open anks7190 opened 3 years ago

anks7190 commented 3 years ago

Hi ,

I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise. And there are limitations on the size of the production code base. Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge . I don't think I require all the dependencies in my code. How can I remove the unnecessary ones ? Or Whats the best approach to productionize the code which uses the above basic pyjanitor functions.

To get requirement.txt file, I did -- python -m venv test_env pip install pyjanitor pip freeze > pyjanitor_req.txt

pyjanitor_req.txt

ericmjl commented 3 years ago

Hi @anks7190, it's lovely to hear that you're using pyjanitor, and potentially in prod too!

Yes, there is a way to install pyjanitor without the deps. If you're willing to manage the dependencies manually, the way to do so with pip is as follows:

pip install --no-deps pyjanitor
pip install <the select subset of deps necessary; I think it's mostly going to be pandas and pandas-flavor>

You might have to experiment a bit, as this isn't something we've encountered before. On the roadmap I do hope to split out the codebase to make this not so much of a monolithic project, but the shared infrastructure (docs, for example) make it convenient to have everything in one place. Keep your eyes peeled.

Jython1415 commented 1 year ago

I'm new here but I saw this was marked as a "good first issue" so I decided to take a look.

I think this has already been resolved.

$ python -m venv .venv
$ source .venv/bin/activate
$ pip install pyjanitor
$ pip freeze
multipledispatch==1.0.0
natsort==8.4.0
numpy==1.25.2
packaging==23.1
pandas==2.0.3
pandas-flavor==0.6.0
pyjanitor==0.25.0
python-dateutil==2.8.2
pytz==2023.3
scipy==1.11.1
six==1.16.0
tzdata==2023.3
xarray==2023.7.0

Is there any more simplification of the dependencies that needs to be done?

samukweku commented 12 months ago

Hi @Jython1415 this is still available to work on. Care to do a PR?

Jython1415 commented 11 months ago

@samukweku I don't think any changes need to be made. This issue was first opened in 2021 and at the time installing pyjanitor would install 133 total packages. When I checked in August (see above comment) the count was down to 13.

I think this issue can be closed as resolved.