Closed rouille closed 6 months ago
Thanks Ben! One question is if we wanted to be able to access some of the manual data files in OGE from other repos (for example data/manual/ba_reference.csv
, would there be a way to do that with how this is currently packaged? Would these data files have to live in src
? Or could you just run oge.load_data.load_ba_reference()
from the other repo and it would load the table?
Thanks Ben! One question is if we wanted to be able to access some of the manual data files in OGE from other repos (for example
data/manual/ba_reference.csv
, would there be a way to do that with how this is currently packaged? Would these data files have to live insrc
? Or could you just runoge.load_data.load_ba_reference()
from the other repo and it would load the table?
When you install the package the data and config folders (and files within) are available and be accessed in .virtualenvs/package/lib/python3.11/site-packages/data/ in my case.. Also, any module and functions therein can access the data (I was able to run the pipeline). So:
>>> from oge.load_data import load_ba_reference
>>> ba = load_ba_reference()
>>> ba
ba_code ba_name ba_category timezone_reporting_eia930 timezone_local us_ba activation_date retirement_date ba_name_ferc activation_date_ferc retirement_date_ferc source ba_number
0 AEBN AESC, LLC - AEBN NaN NaN NaN Yes 2004-01-01 2006-07-01 NaN NaN NaN FERC 1
1 AEC PowerSouth Energy Cooperative NaN US/Central US/Central Yes 2004-01-01 2021-09-01 Alabama Electric Cooperative, Inc. NaN NaN EIA & FERC 2
2 AECI Associated Electric Cooperative, Inc. NaN US/Central US/Central Yes 2004-01-01 NaT NaN NaN NaN EIA & FERC 3
3 AEGL AESC, LLC - Gleason NaN NaN NaN Yes 2004-01-01 2006-07-01 NaN NaN NaN FERC 4
4 AELC AESC, LLC - Lincoln Center NaN NaN NaN Yes 2004-01-01 2006-07-01 NaN NaN NaN FERC 5
.. ... ... ... ... ... ... ... ... ... ... ... ... ...
235 TXMS No Balancing Authority - Texas miscellaneous US/Central US/Central Yes NaT NaT NaN NaN NaN NaN 948
236 UTMS No Balancing Authority - Utah miscellaneous US/Mountain US/Mountain Yes NaT NaT NaN NaN NaN NaN 949
237 WAMS No Balancing Authority - Washington miscellaneous US/Pacific US/Pacific Yes NaT NaT NaN NaN NaN NaN 953
238 WIMS No Balancing Authority - Wisconsin miscellaneous US/Central US/Central Yes NaT NaT NaN NaN NaN NaN 955
239 NaN No Balancing Authority miscellaneous NaN NaN Yes NaT NaT NaN NaN NaN NaN 999
[240 rows x 13 columns]
and this can be done from anywhere
This looks good. Before I approve/merge, I have a couple of questions for you:
1. Now that we've changed the import statements to use absolute references (e.g. `oge.load_data`, will these import statements continue to work if you just clone the repo locally and run functions/notebooks without installing the package? Or will you have to install the package to continue using locally?
I am currently running the data pipeline as a script: python data_pipeline.py --year 2021
from the src/oge folder. So far so good
2. Do we need to update the import statements in all of the notebooks so that they continue to work?
Thanks for pointing this out. I need to update the import statements in the notebook.
3. What are any next steps that will need to happen before oge can be installed in another repo? You mentioned something about PyPi? Will this branch need to be merged into main before oge can be packaged?
I need to double check that oge
is not taken in the registry. I already checked and it was available. Then, we can upload the package to PyPi. You should create an account there. This can be done from any branch I believe but I recommend that in the future, we create a GitHub workflows (deploy.yml) that automatically upload the package each time a new release is created.
Next Steps
So I successfully installed oge
in a virtual environment, although I'm running into an issue with the filepaths.
For me, oge is installed at
C:\Users\greg.miller\.pyenv\pyenv-win\versions\3.11.4\Lib\site-packages\oge
and the data
folder is installed at
C:\Users\greg.miller\.pyenv\pyenv-win\versions\3.11.4\Lib\site-packages\data
.
This means that relative to the oge
directory, the data folder is ../data
. However, in filepaths
, the top_folder
is ../../../
, which means that if I run a command to load data, it is looking in 'C:/Users/greg.miller/.pyenv/pyenv-win/versions/3.11.4/Lib/data
, and raising a FileNotFound
error.
It seems like a simple fix would be to just change "../../../" in filepaths.top_folder()
to "../", but I'm not sure why you didn't run into this error on your machine @Ben?
I might also propose that if this data folder is going to live in site-packages
, we may want to rename the directory to oge_data
so that it is more obvious that it is related to this project and easier to find. We may also want to just move the data folder to src/oge/data
(see analog of how pudl
does this: https://github.com/catalyst-cooperative/pudl/tree/main/src/pudl/package_data).
I'm also noticing that eia930.py
includes the following line:
os.environ["GRIDEMISSIONS_CONFIG_FILE_PATH"] = top_folder("config/gridemissions.json")
.
This config folder lives outside of src
, so I am also getting a FileNotFound
error when I try to run from oge import eia930
Proposed fixes:
filepaths.top_folder()
relative reference to point at src/oge
(would need to test that this doesn't break anything)data
folder and config
folder to src/oge
Purpose
Package project so it can be imported from anywhere.
What the code is doing
No code
Testing
Run whole pipeline successfully
Where to look
The most important file is the pyproject.toml file that encloses all the configurations for building the package such as the build backend, the specification of the metadata and the inclusion of data files in the source distribution
Usage Example/Visuals
Create an empty virtual environment setting python version to 3.11. In the virtual environment:
You should get the following prints:
and a newly created dist directory with two files therein: oge-0.2.2.tar.gz and oge-0.2.2-py3-none-any.whl. Then, in the root directory:
that will install the
oge
package along with all its dependencies (can be seen withpip list
). You can run the whole pipeline from anywhere within the virtual environment via:The data can be accessed in the package directory, e.g., .virtualenvs/package/lib/python3.11/site-packages/data/ in my case.
Review estimate
30min
Future work
Upload the distribution archives on PyPi. The first step could be to do it on TestPyPI, which is a separate instance of the package index intended for testing and experimentation.
Checklist
black