Closed acxz closed 2 years ago
@JoshData friendly ping
@JoshData friendly ping
Hey,
Sorry I'm actually confused by the changes. If one follows the README steps, I think you will get a copy of the library installed in the usual Python packages location, but then it says to run "./run" which will run the version that has been cloned. Or will it? Will it get the modules in the working directory or will it go to the installed package?
If we following the README as is, ./run
will execute the version that has been cloned. Typing just run
will execute the version that has been installed.
I don't think this is a good idea...
Do you mean to say that the documentation should be changed to run
or that run
should not be a command that should be called from anywhere?
Just to clarify ./run
will only work if you are in the cloned directory.
The documentation needs to present a clear way to use the package, so it has to either use an installed package or not install it. (It shouldn't install a package but then not use it.) It shouldn't create two local copies of the package either (the git clone and then the one installed by pip). And "run" is a bad name for a globally-installed script.
The documentation needs to present a clear way to use the package, so it has to either use an installed package or not install it. (It shouldn't install a package but then not use it.)
Thats sensible, I'll change the local ./run
s to run
s.
It shouldn't create two local copies of the package either (the git clone and then the one installed by pip).
This is a harder one to tackle. As the only way to install the package is to clone it first, unless you are using a package manager. There is no way around this unless you symlink the two with pip install -e .
which is commonly done when developing on python code. I think this should stay the way it is.
And "run" is a bad name for a globally-installed script.
Technically it will only work in the virtual env the package is installed, but yeah. Totally agree. Do you have a proposition for a better name?
@JoshData any further thoughts?
Or should we go ahead and just scrap the idea of having congress as a python package?
Is this PR solving a real problem that you are having, or is this just for fun?
I think I have outlined the real world problems this resolves in the OP.
For most users however, this brings us one step closer to resolve #188, i.e. able to install unitedstates/congress with a pip install united-states
as well as being able to pull dependencies properly instead of using requirements.txt.
I would very much like for this PR to be merged in as I believe it is useful. I just suggested the second point as ultimately I don't want to keep pushing this unless you believe there is merit in this PR, since I should respect your wishes as being the current maintainer of this project.
@JoshData, @acxz I also think that having this as a Python package is helpful. As a general principle, I've moved to having all of my Python scripts as packages so that I can call their modules without remembering the paths or setting environment variables.
In this case I think @JoshData has a valid concern about namespacing. Using run
or runs
is just not clear outside of this context. The command should be something like usc-run
. More broadly, for me, the greatest value of a package is not the command-line, but being able to from uscongress.bills import get_bills_to_process
. This would require a more significant change to create a unique namespace. I think this can be done by changing the 'tasks' directory to 'uscongress'.
This may be disruptive to current users, but I think it provides more value than being able to call the run
command from anywhere; if that were my main problem I'd add an alias to .bash_profile
.
If @JoshData is open to the more significant change above, I would suggest a few updates (in both the code and documentation):
uscongress
) to call the scripts as modules (e.g. from uscongress.bills import get_bills_to_process
). usc-run
. I think it shoud be possible to still use './run' from within the directory so that any users' cronjob
s or other current uses are not affected. This is, to me, less confusing than ./run
vs run
.The command should be something like usc-run.
usc-run
is a good suggestion, I'll go ahead and update the PR to reflect that.
More broadly, for me, the greatest value of a package is not the command-line, but being able to from uscongress.bills import get_bills_to_process.
That was actually on my to do once the setup.py is integrated in this PR. Wanted to take it a step at a time ;)
Points 1 and 2 should be addressed in a follow up PR. I'll go ahead and address Point 3 in this PR.
Thanks for your input @aih!
hmm turns out I can't alias the run
script with setup.py
's scripts
field. I would have to use console_scripts
of entry_points
, but that requires a python module rather than a script, which means I would have to change the structure such that run is a module. Hence requiring points 1 and 2. @aih if you think you can accomplish installing the run
script under a usc-run
with minimal changes, send a PR over at my fork and this branch.
Otherwise, personally I think this PR should be merged in as it is, and we can address @aih's concerns in a later (possibly more disruptive, in terms of file structure, PR). Just getting a setup.py
in this repo's pipeline is a good atomic PR imo.
@JoshData if you could let us know if you are still interested in this feature that would be appreciated.
Definitely useful to have this set as a package:
I agree with @JoshData for the PR itself; I don't think it's a good idea to include the run script, or a "command" script at all. The modules separated as is is good enough for people to simply import (e.g. the vote.py file is meant for vote scraping, and that's it, just like firebase admin); it'll just need extra documentation. +1 on making this a package though!
Handle the namespace issue for the command-line. Any global installation should use something clearly distinctive, like usc-run. I think it shoud be possible to still use './run' from within the directory so that any users' cronjobs or other current uses are not affected. This is, to me, less confusing than ./run vs run.
With the latest commit this is taken care of.
To really make the python methods and modules importable, I tried to create the setup.py file so that the directory structure could be kept the same. However, after spending quite a bit of time I wasn't able to get it to work. I decided to follow python packaging convention (this involves moving files into a congress/
subfolder) and imports now work properly. Due to some heavy changes I also squashed the commits for easy readability.
With this @aih's concerns are addressed and with a pip install .
, congress
can be installed where you can call the scripts and load/run methods located within the tasks/
files.
Associated doc changes have been added as well.
I would urge anyone interested in this endeavor to test out the package for their particular use case. Please feel to critique and state any concerns!
This seems OK to me. Would be great if someone else could try it out next.
Would be great if someone else could try it out next.
pinging @stevesdawg @Andrew-Chen-Wang @aih
If you guys can test it out for your use case and report back if it works or doesn't please do so.
Thx for the review!
I would prefer if the package was versioned by date.
That actually makes a lot of sense. I'll add the change depending on @JoshData's opinion.
This is great work, and I will test it over the next couple of weeks. My main use case is to: 1. Download bulk data (from ProPublica, here), then pip install, and then update the data with usc-run bills
. @acxz is there any reason this would not work?
is there any reason this would not work?
None at all
@aih things thatre worth testing seems like the CLI command, general importing of the package, and a single use case
The test plan I ran for these changes, from inside the repo root:
pip install .
usc-run votes --congress=117 --session=2022 --force=True
usc-run bills
to process bills I've already downloaded.These worked normally as expected! 🎉
One side-effect however, when I call usc-run
from somewhere other than the repo root, it creates the data/
and cache/
dirs in the directory where I called it. I feel like this is undesirable, and it should always download to some default location unless a download destination is provided.
What are people's thoughts on this?
One side-effect however, when I call
usc-run
from somewhere other than the repo root, it creates thedata/
andcache/
dirs in the directory where I called it. I feel like this is undesirable, and it should always download to some default location unless a download destination is provided.What are people's thoughts on this?
I agree with you, but I think that can be a subsequent PR and need not block merge. Right now, you can only really run from within the repo.
If you made a system install and implemented something where the folders are created at "top level", you wouldn't see the folders at all anymore since they'd be next to or even inside the package content (ie wherever the environment is). I think this behavior is desirable.
Command line importing looks good 👍🏾
Python 3.6.12 |Anaconda, Inc.| (default, Sep 8 2020, 23:10:56)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import congress
>>> congress
<module 'congress' from '/home/shrivathsav/anaconda3/envs/congress_py36/lib/python3.6/site-packages/congress/__init__.py'>
>>> from congress import tasks
>>> tasks
<module 'congress.tasks' from '/home/shrivathsav/anaconda3/envs/congress_py36/lib/python3.6/site-packages/congress/tasks/__init__.py'>
>>> congress.tasks
<module 'congress.tasks' from '/home/shrivathsav/anaconda3/envs/congress_py36/lib/python3.6/site-packages/congress/tasks/__init__.py'>
>>> from congress.tasks import bills
>>> bills
<module 'congress.tasks.bills' from '/home/shrivathsav/anaconda3/envs/congress_py36/lib/python3.6/site-packages/congress/tasks/bills.py'>
>>> from congress.tasks import votes
>>> votes
<module 'congress.tasks.votes' from '/home/shrivathsav/anaconda3/envs/congress_py36/lib/python3.6/site-packages/congress/tasks/votes.py'>
>>> votes.vote_ids_for_house
<function vote_ids_for_house at 0x7fa0bcf91c80>
This looks good to me.
@JoshData it has been some time, since this PR has been tested by folks here. LGTM?
Thanks all!
@JoshData I'm back haha, (This might be my last one in a while)
This patchset allows users to install congress as a python package. This offers several benefits:
dependencies: can be directly specified in the setup.py instead of having a requirements file. This ensures that users who want to use congress will have the requirements instead running an extra command.
exposing entry points: the main
run
andtest/run
files can be run anywhere as long as your python environment is the same one where you installed congress in, providing more convenience to users say in writing their own shell scripts without hardcoding the congress directory. Of course you can still use./run
andpython run
as before, so documentation does not need to be changed.Makes it easier for package managers to work with. While this does not directly reslve #188 if someone wants to, one can easily upload congress to PyPi now, saving users the step to even clone the repo. Distributions that currently package this codebase, such as Arch Linux, currently manually install the scripts to the usr/bin folders. (https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=congress-git) Now they can use built in python build instructions such as
python setup.py install
.If this is merged, this can fixes #192