ward-wise / data-analysis

Data analysis on Chicago infrastructure and infrastructure spending
MIT License
3 stars 7 forks source link

Refactor and cleanup project structure #34

Closed suryadutta closed 3 months ago

suryadutta commented 3 months ago

Description

When I first started working in this project, I had a difficult time getting to a place where I could run all the scripts and tests in an easily reproducible way.

This PR is an attempt at a refactor that will make it easier for new members of this team to get started with these scripts. The changes in this PR include:

This seemed like a lot of things, so limiting this PR to just this. I'll open some follow up PRs for configuring tests / adding things like linting / etc.

Please weigh in with any thoughts / concerns / feedback!

Notes for reviewers

smacmullan commented 3 months ago

I'm getting an this error when I run the make scripts (running on Windows). Will investigate more tomorrow. The file in the error changes each time I run it (dotenv, normalizer, f2py, etc.).

WARNING: Failed to write executable - trying to use .deleteme logic ERROR: Could not install packages due to an OSError: [WinError 2] The system cannot find the file specified: 'C:\Python312\Scripts\f2py.exe' -> 'C:\Python312\Scripts\f2py.exe.deleteme'

suryadutta commented 3 months ago
WARNING: Failed to write executable - trying to use .deleteme logic
ERROR: Could not install packages due to an OSError: [WinError 2] The system cannot find the file specified: 'C:\Python312\Scripts\f2py.exe' -> 'C:\Python312\Scripts\f2py.exe.deleteme'

According to StackOverflow, this may be due to insufficient permissions for your user.

I'm not super familiar with developing with Python on Windows, but I think running the shell as an admin will get you past this.

Also, can you describe your Python environment in Windows? Are you using venv or conda to manage your environment?

smacmullan commented 3 months ago

I'm running Python 3.12 on Windows 11. No virtual environment setup at the moment. Running it in an admin terminal mostly worked. It got through the processing scripts, but ran into an issue parsing the app_token from the .env file. I'm in the process of troubleshooting. "'app_token' is not recognized as an internal or external command, operable program or batch file."

If you know any good resources explaining makefiles, could you add them to the README? What's the best way for a user to run individual scripts? And if someone is updating scripts or adding new ones, what do they need to know to make that work with the new setup? (maybe this for explaining makefiles? https://makefiletutorial.com/)

smacmullan commented 3 months ago

Definitely a PowerShell issue. I'm running from git bash (admin permissions) and it's working fine.

kollerbud commented 3 months ago

I think we can remove the whole header altogether. I have been querying Chicago data portal without an header(unknowingly) and haven't noticed any issues.

suryadutta commented 3 months ago

I think we can remove the whole header altogether. I have been querying Chicago data portal without an header(unknowingly) and haven't noticed any issues.

@kollerbud while the secret_token doesn't seem to be needed, the app_token is definitely needed in the header.

While working on expanding test coverage, I refactored a test here into an integration test, which fails unless the app_token given is correct: https://github.com/ward-wise/data-analysis/blob/51c1a0287f2355718c1c620211da9036f1d0b657/tests/chicago_participatory_urbanism/test_location_geocoding_api.py#L18-L104

I can add more specific error handling for requests sent through these APIs in another PR

suryadutta commented 3 months ago

It got through the processing scripts, but ran into an issue parsing the app_token from the .env file. I'm in the process of troubleshooting. "'app_token' is not recognized as an internal or external command, operable program or batch file."

@smacmullan hmm this is probably due to this line not being a valid way to set environment variables in Windows shell: https://github.com/ward-wise/data-analysis/blob/17714388afce4d4f12237276f6bc140036c4de22/Makefile#L13

In general, Makefiles are a build tool that are commonly used in repositories to create a consistent and reproducible dev / build / CI environment for MacOS and Linux ecosystems. I've really only worked in these environments before, so If supporting Windows dev environment is needed (and alternatives like Cygwin are not used), I'll need to dig a bit more into Windows env stuff and figure out some workarounds.

suryadutta commented 3 months ago

I'm running Python 3.12 on Windows 11. No virtual environment setup at the moment.

I would highly recommend configuring some kind of virtual environment when working in a specific project. For windows, I think using Mamba is the easiest and most performant (offshoot of conda, uses conda-forge by default).

See here for a link to the windows installer: https://github.com/conda-forge/miniforge?tab=readme-ov-file#download

kollerbud commented 3 months ago

@suryadutta can you try to run the class locally? It is working for me with just even with empty api_header

suryadutta commented 3 months ago

huh @kollerbud you're totally right! when I take the APP_TOKEN out of the header, it does work as expected. My apologies!

but here's the weird part - if I keep it in and give it a bad value, like FAKE, I get a 403 unauthorized error.

something super strange with the way this is working... I'll take it out for now to unblock @smacmullan, but might come back and investigate what's going on. Maybe some kind of IP or device caching on Chicago's side? no idea.

suryadutta commented 3 months ago

It got through the processing scripts, but ran into an issue parsing the app_token from the .env file. I'm in the process of troubleshooting. "'app_token' is not recognized as an internal or external command, operable program or batch file."

@smacmullan you should be unblocked from this error now, thanks to @kollerbud

smacmullan commented 3 months ago

Everything's working on my device. We can design the make file around Linux/macOS setups if needed. It's simple enough to run the commands in a Git Bash terminal and have everything work.