Closed takposha closed 2 years ago
Added a config file for adjusting variables. Updated ReadMe for steps on config file management. Adjust Docker files to read .env file and disabled sleep infinity.
Hello @takposha! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
mpr-research-data.py
:Line 74:120: E501 line too long (141 > 119 characters)
That should be the last commit. VSCode PEP 8 formatting doesn't catch everything PepSpeaks does. But that should be sorted now.
@pep8speaks suggest diff
Added the better GCP key implementation.
Added logging to console using the logging library.
I thought what you had before with the GCP key JSON embedded in the .env
file was a good thing. I think of that key in JSON as a single entity. We don't know what all of its contents will be. Google may change it at some point. So, if we break it down into its components now, maybe in the future it won't work. It seems better to keep it as a single JSON string.
My change reintroduced the json module, so I organized the imports, too. It's fairly common to organize imports by core Python modules, followed by third-party modules, and lastly local project modules.
So, the good news is that when I ran the application that I checked out of your branch, it worked well.
Aside from the GCP key change I made, I made a couple of changes to my .env
file, too. I changed NUMBER_OF_MONTHS = 1
to run a shorter test. And I changed GCLOUD_BUCKET = 'mpr-research-data-uploads-lsloan_test'
, to write to a different bucket for my test, so it wouldn't disturb what was already there.
I ran the app before I created the new bucket, just to see what would happen. I got results like this:
mpr-research-data | 2022-05-10T14:44:52+0000 INFO [mpr-research-data.py:170] - Slicing: 495677 - Math 216 WN 2022.tsv
mpr-research-data | 2022-05-10T14:44:52+0000 INFO [mpr-research-data.py:181] - Saving to GCP: 495677 - Math 216 WN 2022.tsv
mpr-research-data | 2022-05-10T14:44:56+0000 ERROR [mpr-research-data.py:186] - Error Message: 404 POST https://storage.googleapis.com/upload/storage/v1/b/mpr-research-data-uploads-lsloan_test/o?uploadType=multipart: {
mpr-research-data | "error": {
mpr-research-data | "code": 404,
mpr-research-data | "message": "The specified bucket does not exist.",
mpr-research-data | "errors": [
mpr-research-data | {
mpr-research-data | "message": "The specified bucket does not exist.",
mpr-research-data | "domain": "global",
mpr-research-data | "reason": "notFound"
mpr-research-data | }
mpr-research-data | ]
mpr-research-data | }
mpr-research-data | }
mpr-research-data | : ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>)
mpr-research-data | 2022-05-10T14:44:56+0000 ERROR [mpr-research-data.py:187] - Failed to upload Course Data for 495677 - Math 216 WN 2022.tsv to GCP.
Which is good. It reported that once for each of the courses that met my 1-month criteria.
So, I created the required bucket, ran it again, and all was good.
Currently working/Completed: dbToBucketScript.py can access the DB, retrieve Course IDs, then retrieve corresponding Course data and send them to a GCP bucket. Error messages and comments have been applied across the script so it should be easier to know if and why something does not work. The SQL queries can be modified using variables the user provides. There is a basic README file now. It needs to be updated for config parameters.
Not working/needs to be done: I'm not sure how a config file should be made for a Docker application. I have listed all variables that are to be config ones in the python script files, so it should just be a matter of moving them into a config file. Will use a .env file setup to do this. I don't know how frequent data calls are, and if the files get updated, but maybe having some way to check if the GCP bucket TSV file already exists and is up to date can help avoid unnecessary data pushes to the bucket. This shouldn't matter if it's only a small amount of data daily.
Resolves #1 Resolves #2 Resolves #7