singularity-energy / open-grid-emissions

Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
MIT License
72 stars 5 forks source link

Give user flexibility for installing dependencies #313

Closed rouille closed 10 months ago

rouille commented 10 months ago

Purpose

Allow users to install dependencies from a requiremnts.txt file using pip install -r requirements.txt or from Pipfile.lock file using pipenv sync

What the code is doing

No code

Testing

Did pipenv sync, then pipenv shell to activate environment:

[~/Singularity/open-grid-emissions] (ben/format) brdo$ pipenv sync
Creating a virtualenv for this project...
Pipfile: /Users/brdo/Singularity/open-grid-emissions/Pipfile
Using /Users/brdo/.pyenv/versions/3.10.4/bin/python3 (3.10.4) to create virtualenv...
⠸ Creating virtual environment...created virtual environment CPython3.10.4.final.0-64 in 282ms
  creator CPython3Posix(dest=/Users/brdo/.virtualenvs/open-grid-emissions-zm3GQQDc, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/Users/brdo/Library/Application Support/virtualenv)
    added seed packages: pip==23.3.1, setuptools==68.2.2, wheel==0.41.3
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator

✔ Successfully created virtual environment!
Virtualenv location: /Users/brdo/.virtualenvs/open-grid-emissions-zm3GQQDc
Installing dependencies from Pipfile.lock (2a0f29)...
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
All dependencies are now up-to-date!
[~/Singularity/open-grid-emissions] (ben/format) brdo$ pipenv shell
Launching subshell in virtual environment...
[~/Singularity/open-grid-emissions] (ben/format) brdo$  . /Users/brdo/.virtualenvs/open-grid-emissions-zm3GQQDc/bin/activate

Then successfully ran python data_pipeline.py --year 2021:

(open-grid-emissions) [~/Singularity/open-grid-emissions/src] (ben/dependencies) brdo$ python data_pipeline.py --year 2021
2023-11-28 13:54:54 [INFO] oge.data_pipeline:67 

Running with the following options:
  * year = 2021
  * shape_individual_plants = True
  * small = False
  * flat = False
  * skip_outputs = False

2023-11-28 13:54:54 [INFO] oge.data_pipeline:112 Running data pipeline for year 2021
2023-11-28 13:54:54 [INFO] oge.data_pipeline:117 1. Downloading data
2023-11-28 13:54:54 [INFO] oge.download_data:102 PUDL version already downloaded
2023-11-28 13:54:54 [INFO] oge.download_data:44 egrid2018_data_v2.xlsx already downloaded, skipping.
2023-11-28 13:54:54 [INFO] oge.download_data:44 egrid2019_data.xlsx already downloaded, skipping.
2023-11-28 13:54:54 [INFO] oge.download_data:44 egrid2020_data.xlsx already downloaded, skipping.
2023-11-28 13:54:54 [INFO] oge.download_data:44 eGRID2021_data.xlsx already downloaded, skipping.
2023-11-28 13:54:54 [INFO] oge.download_data:44 EIA930_BALANCE_2021_Jan_Jun.csv already downloaded, skipping.
2023-11-28 13:54:54 [INFO] oge.download_data:44 EIA930_BALANCE_2021_Jul_Dec.csv already downloaded, skipping.
2023-11-28 13:54:54 [INFO] oge.download_data:44 EIA930_INTERCHANGE_2021_Jan_Jun.csv already downloaded, skipping.
2023-11-28 13:54:54 [INFO] oge.download_data:44 EIA930_INTERCHANGE_2021_Jul_Dec.csv already downloaded, skipping.
2023-11-28 13:54:54 [INFO] oge.download_data:44 EIA930_BALANCE_2020_Jan_Jun.csv already downloaded, skipping.
2023-11-28 13:54:54 [INFO] oge.download_data:44 EIA930_BALANCE_2020_Jul_Dec.csv already downloaded, skipping.
2023-11-28 13:54:54 [INFO] oge.download_data:44 EIA930_INTERCHANGE_2020_Jan_Jun.csv already downloaded, skipping.
2023-11-28 13:54:54 [INFO] oge.download_data:44 EIA930_INTERCHANGE_2020_Jul_Dec.csv already downloaded, skipping.
2023-11-28 13:54:54 [INFO] oge.download_data:44 epa_eia_crosswalk.csv already downloaded, skipping.
2023-11-28 13:54:54 [INFO] oge.download_data:44 eia8602021 already downloaded, skipping.
2023-11-28 13:54:54 [INFO] oge.download_data:44 f923_2021 already downloaded, skipping.
2023-11-28 13:54:54 [INFO] oge.data_pipeline:149 2. Identifying subplant IDs
2023-11-28 13:54:54 [INFO] oge.data_cleaning:57 loading CEMS ids
2023-11-28 13:55:06 [INFO] oge.data_cleaning:61 identifying unique subplants
2023-11-28 13:55:07 [    INFO] catalystcoop.pudl.transform.eia861:456 Started with 323 missing BA Codes out of 13488 records (2.39%)
2023-11-28 13:55:08 [    INFO] catalystcoop.pudl.transform.eia861:480 Ended with 323 missing BA Codes out of 13488 records (2.39%)
2023-11-28 13:55:08 [    INFO] catalystcoop.pudl.output.eia860:177 97.6% of plant records have consistently reported BA Codes
2023-11-28 13:55:08 [    INFO] catalystcoop.pudl.output.eia860:227 Before any filling treatment has been applied. 2.4% of records have no BA codes
2023-11-28 13:55:08 [    INFO] catalystcoop.pudl.output.eia860:227 Backfilling and consistent value is the same. Filled w/ most consistent BA code. 2.4% of records have no BA codes
2023-11-28 13:55:08 [    INFO] catalystcoop.pudl.output.eia860:227 SWPP is most consistent value. Filled w/ oldest BA code. 2.4% of records have no BA codes
2023-11-28 13:55:08 [    INFO] catalystcoop.pudl.output.eia860:227 NWMT is most consistent value. Filled w/ oldest BA code. 2.4% of records have no BA codes
2023-11-28 13:55:08 [    INFO] catalystcoop.pudl.output.eia860:227 Two or more years of oldest BA code. Filled w/ oldest BA code. 2.4% of records have no BA codes
2023-11-28 13:55:11 [    INFO] catalystcoop.pudl.transform.eia861:456 Started with 323 missing BA Codes out of 13488 records (2.39%)
2023-11-28 13:55:12 [    INFO] catalystcoop.pudl.transform.eia861:480 Ended with 323 missing BA Codes out of 13488 records (2.39%)
2023-11-28 13:55:12 [    INFO] catalystcoop.pudl.output.eia860:177 97.6% of plant records have consistently reported BA Codes
2023-11-28 13:55:12 [    INFO] catalystcoop.pudl.output.eia860:227 Before any filling treatment has been applied. 2.4% of records have no BA codes
2023-11-28 13:55:12 [    INFO] catalystcoop.pudl.output.eia860:227 Backfilling and consistent value is the same. Filled w/ most consistent BA code. 2.4% of records have no BA codes
2023-11-28 13:55:12 [    INFO] catalystcoop.pudl.output.eia860:227 SWPP is most consistent value. Filled w/ oldest BA code. 2.4% of records have no BA codes
2023-11-28 13:55:12 [    INFO] catalystcoop.pudl.output.eia860:227 NWMT is most consistent value. Filled w/ oldest BA code. 2.4% of records have no BA codes
2023-11-28 13:55:12 [    INFO] catalystcoop.pudl.output.eia860:227 Two or more years of oldest BA code. Filled w/ oldest BA code. 2.4% of records have no BA codes
2023-11-28 13:56:10 [    INFO] catalystcoop.pudl.transform.eia861:456 Started with 2108 missing BA Codes out of 68945 records (3.06%)
2023-11-28 13:56:11 [    INFO] catalystcoop.pudl.transform.eia861:480 Ended with 1981 missing BA Codes out of 68945 records (2.87%)
2023-11-28 13:56:12 [    INFO] catalystcoop.pudl.output.eia860:177 96.7% of plant records have consistently reported BA Codes
2023-11-28 13:56:12 [    INFO] catalystcoop.pudl.output.eia860:227 Before any filling treatment has been applied. 3.1% of records have no BA codes
2023-11-28 13:56:12 [    INFO] catalystcoop.pudl.output.eia860:227 Backfilling and consistent value is the same. Filled w/ most consistent BA code. 2.9% of records have no BA codes
2023-11-28 13:56:12 [    INFO] catalystcoop.pudl.output.eia860:227 SWPP is most consistent value. Filled w/ oldest BA code. 2.9% of records have no BA codes
2023-11-28 13:56:12 [    INFO] catalystcoop.pudl.output.eia860:227 NWMT is most consistent value. Filled w/ oldest BA code. 2.9% of records have no BA codes
2023-11-28 13:56:12 [    INFO] catalystcoop.pudl.output.eia860:227 Two or more years of oldest BA code. Filled w/ oldest BA code. 2.9% of records have no BA codes
2023-11-28 13:56:15 [    INFO] catalystcoop.pudl.output.eia860:509 Filling technology type
2023-11-28 13:56:16 [    INFO] catalystcoop.pudl.output.eia860:597 Filled technology_type coverage now at 100.0%
2023-11-28 13:56:18 [    INFO] catalystcoop.pudl.transform.eia861:456 Started with 323 missing BA Codes out of 13488 records (2.39%)
2023-11-28 13:56:19 [    INFO] catalystcoop.pudl.transform.eia861:480 Ended with 323 missing BA Codes out of 13488 records (2.39%)
2023-11-28 13:56:19 [    INFO] catalystcoop.pudl.output.eia860:177 97.6% of plant records have consistently reported BA Codes
2023-11-28 13:56:19 [    INFO] catalystcoop.pudl.output.eia860:227 Before any filling treatment has been applied. 2.4% of records have no BA codes
2023-11-28 13:56:19 [    INFO] catalystcoop.pudl.output.eia860:227 Backfilling and consistent value is the same. Filled w/ most consistent BA code. 2.4% of records have no BA codes
2023-11-28 13:56:19 [    INFO] catalystcoop.pudl.output.eia860:227 SWPP is most consistent value. Filled w/ oldest BA code. 2.4% of records have no BA codes
2023-11-28 13:56:19 [    INFO] catalystcoop.pudl.output.eia860:227 NWMT is most consistent value. Filled w/ oldest BA code. 2.4% of records have no BA codes
2023-11-28 13:56:19 [    INFO] catalystcoop.pudl.output.eia860:227 Two or more years of oldest BA code. Filled w/ oldest BA code. 2.4% of records have no BA codes
2023-11-28 13:56:20 [    INFO] catalystcoop.pudl.output.eia860:509 Filling technology type
2023-11-28 13:56:20 [    INFO] catalystcoop.pudl.output.eia860:597 Filled technology_type coverage now at 100.0%
2023-11-28 13:56:20 [WARNING] oge.validation:233 There are 276 subplants that only contain one part of a combined cycle system.
Subplants that represent combined cycle generation should contain both CA and CT parts.
2023-11-28 13:56:20 [WARNING] oge.validation:236 

...

2023-11-28 15:35:00 [INFO] oge.output_data:131 Exporting GCPD to data/results/2021//carbon_accounting/hourly/
2023-11-28 15:35:00 [INFO] oge.validation:149 Checking that fuel and emissions values are positive...  
2023-11-28 15:35:00 [INFO] oge.validation:175 OK
2023-11-28 15:35:00 [INFO] oge.validation:181 Checking that no values are missing...  
2023-11-28 15:35:00 [INFO] oge.validation:198 OK
2023-11-28 15:35:00 [INFO] oge.output_data:131 Exporting GCPD to data/results/2021//carbon_accounting/monthly/
2023-11-28 15:35:00 [INFO] oge.validation:149 Checking that fuel and emissions values are positive...  
2023-11-28 15:35:00 [INFO] oge.validation:175 OK
2023-11-28 15:35:00 [INFO] oge.validation:181 Checking that no values are missing...  
2023-11-28 15:35:00 [INFO] oge.validation:198 OK
2023-11-28 15:35:00 [INFO] oge.output_data:131 Exporting GCPD to data/results/2021//carbon_accounting/annual/
2023-11-28 15:35:00 [INFO] oge.validation:149 Checking that fuel and emissions values are positive...  
2023-11-28 15:35:00 [INFO] oge.validation:175 OK
2023-11-28 15:35:00 [INFO] oge.validation:181 Checking that no values are missing...  
2023-11-28 15:35:00 [INFO] oge.validation:198 OK
2023-11-28 15:35:01 [INFO] oge.output_data:131 Exporting ISNE to data/results/2021//carbon_accounting/hourly/
2023-11-28 15:35:01 [INFO] oge.validation:149 Checking that fuel and emissions values are positive...  
2023-11-28 15:35:01 [INFO] oge.validation:175 OK
2023-11-28 15:35:01 [INFO] oge.validation:181 Checking that no values are missing...  
2023-11-28 15:35:01 [INFO] oge.validation:198 OK
2023-11-28 15:35:01 [INFO] oge.output_data:131 Exporting ISNE to data/results/2021//carbon_accounting/monthly/
2023-11-28 15:35:01 [INFO] oge.validation:149 Checking that fuel and emissions values are positive...  
2023-11-28 15:35:01 [INFO] oge.validation:175 OK
2023-11-28 15:35:01 [INFO] oge.validation:181 Checking that no values are missing...  
2023-11-28 15:35:01 [INFO] oge.validation:198 OK
2023-11-28 15:35:01 [INFO] oge.output_data:131 Exporting ISNE to data/results/2021//carbon_accounting/annual/
2023-11-28 15:35:01 [INFO] oge.validation:149 Checking that fuel and emissions values are positive...  
2023-11-28 15:35:01 [INFO] oge.validation:175 OK
2023-11-28 15:35:01 [INFO] oge.validation:181 Checking that no values are missing...  
2023-11-28 15:35:01 [INFO] oge.validation:198 OK
2023-11-28 15:35:01 [INFO] oge.output_data:131 Exporting AECI to data/results/2021//carbon_accounting/hourly/
2023-11-28 15:35:01 [INFO] oge.validation:149 Checking that fuel and emissions values are positive...  
2023-11-28 15:35:01 [INFO] oge.validation:175 OK
2023-11-28 15:35:01 [INFO] oge.validation:181 Checking that no values are missing...  
2023-11-28 15:35:01 [INFO] oge.validation:198 OK
2023-11-28 15:35:01 [INFO] oge.output_data:131 Exporting AECI to data/results/2021//carbon_accounting/monthly/
2023-11-28 15:35:01 [INFO] oge.validation:149 Checking that fuel and emissions values are positive...  
2023-11-28 15:35:01 [INFO] oge.validation:175 OK
2023-11-28 15:35:01 [INFO] oge.validation:181 Checking that no values are missing...  
2023-11-28 15:35:01 [INFO] oge.validation:198 OK
2023-11-28 15:35:01 [INFO] oge.output_data:131 Exporting AECI to data/results/2021//carbon_accounting/annual/
2023-11-28 15:35:01 [INFO] oge.validation:149 Checking that fuel and emissions values are positive...  
2023-11-28 15:35:01 [INFO] oge.validation:175 OK
2023-11-28 15:35:01 [INFO] oge.validation:181 Checking that no values are missing...  
2023-11-28 15:35:01 [INFO] oge.validation:198 OK
(open-grid-emissions) [~/Singularity/open-grid-emissions/src] (ben/dependencies) brdo$ 

It takes very long time!

Where to look

The src/load module needed to be modify to point to the correct 2021 EIA923 Schedule 8 Annual Environmental Information file

Usage Example/Visuals

Here is the output from pip list

(open-grid-emissions) [~/Singularity/open-grid-emissions] (ben/dependencies) brdo$ pip list
Package                          Version
-------------------------------- -----------------------
addfips                          0.4.0
aiohttp                          3.9.1
aiosignal                        1.3.1
anyio                            4.1.0
appnope                          0.1.3
arelle-release                   2.3.4
argon2-cffi                      23.1.0
argon2-cffi-bindings             21.2.0
arrow                            1.3.0
asttokens                        2.4.1
async-lru                        2.0.4
async-timeout                    4.0.3
attrs                            23.1.0
Babel                            2.13.1
beautifulsoup4                   4.12.2
bleach                           6.1.0
boto3                            1.33.2
botocore                         1.33.2
cached-property                  1.5.2
cachetools                       5.3.2
catalystcoop.dbfread             3.0.0
catalystcoop.ferc-xbrl-extractor 0.8.1
catalystcoop.pudl                0.6.1.dev1665+g81a7a513
certifi                          2023.11.17
cffi                             1.16.0
chardet                          5.2.0
charset-normalizer               3.3.2
clarabel                         0.6.0
click                            8.1.7
click-plugins                    1.1.1
cligj                            0.7.2
cloudpickle                      3.0.0
colorama                         0.4.6
coloredlogs                      15.0.1
comm                             0.2.0
contourpy                        1.2.0
cvxopt                           1.3.2
cvxpy                            1.4.1
cycler                           0.12.1
dask                             2023.1.0
datapackage                      1.15.2
debugpy                          1.8.0
decorator                        5.1.1
defusedxml                       0.7.1
dnspython                        2.4.2
ecos                             2.0.12
email-validator                  2.1.0.post1
et-xmlfile                       1.1.0
exceptiongroup                   1.2.0
executing                        2.0.1
fastjsonschema                   2.19.0
fiona                            1.9.5
fonttools                        4.45.1
fqdn                             1.5.1
frictionless                     4.40.11
frozenlist                       1.4.0
fsspec                           2023.1.0
gcsfs                            2023.1.0
geopandas                        0.12.2
google-api-core                  2.14.0
google-auth                      2.23.4
google-auth-oauthlib             1.1.0
google-cloud-core                2.3.3
google-cloud-storage             2.13.0
google-crc32c                    1.5.0
google-resumable-media           2.6.0
googleapis-common-protos         1.61.0
gridemissions                    0.1.0
h3                               3.7.6
humanfriendly                    10.0
idna                             3.6
ijson                            3.2.3
iniconfig                        2.0.0
ipykernel                        6.27.1
ipython                          8.18.1
isodate                          0.6.1
isoduration                      20.11.0
jedi                             0.19.1
Jinja2                           3.1.2
jmespath                         1.0.1
joblib                           1.3.2
json5                            0.9.14
jsonlines                        4.0.0
jsonpointer                      2.4
jsonschema                       4.20.0
jsonschema-specifications        2023.11.1
jupyter_client                   8.6.0
jupyter_core                     5.5.0
jupyter-events                   0.9.0
jupyter-lsp                      2.2.1
jupyter_server                   2.11.1
jupyter_server_terminals         0.4.4
jupyterlab                       4.0.9
jupyterlab_pygments              0.3.0
jupyterlab_server                2.25.2
kiwisolver                       1.4.5
linear-tsv                       1.1.0
locket                           1.0.0
lxml                             4.9.3
markdown-it-py                   3.0.0
marko                            2.0.2
MarkupSafe                       2.1.3
matplotlib                       3.6.3
matplotlib-inline                0.1.6
mdurl                            0.1.2
mistune                          3.0.2
multidict                        6.0.4
nbclient                         0.9.0
nbconvert                        7.11.0
nbformat                         5.9.2
nest-asyncio                     1.5.8
networkx                         3.0
notebook                         7.0.6
notebook_shim                    0.2.3
numpy                            1.24.4
oauthlib                         3.2.2
openpyxl                         3.1.2
osqp                             0.6.3
overrides                        7.4.0
packaging                        23.2
pandas                           1.5.3
pandocfilters                    1.5.0
parso                            0.8.3
partd                            1.4.1
patsy                            0.5.3
petl                             1.7.14
pexpect                          4.9.0
Pillow                           10.1.0
pip                              23.3.1
platformdirs                     4.0.0
plotly                           5.18.0
pluggy                           1.3.0
prometheus-client                0.19.0
prompt-toolkit                   3.0.41
protobuf                         4.25.1
psutil                           5.9.6
ptyprocess                       0.7.0
pure-eval                        0.2.2
pyarrow                          10.0.1
pyasn1                           0.5.1
pyasn1-modules                   0.3.0
pybind11                         2.11.1
pycparser                        2.21
pydantic                         1.10.13
Pygments                         2.17.2
pyparsing                        3.1.1
pyproj                           3.6.1
pytest                           7.4.3
python-dateutil                  2.8.2
python-json-logger               2.0.7
python-slugify                   8.0.1
python-snappy                    0.6.1
pytz                             2023.3.post1
PyYAML                           6.0.1
pyzmq                            25.1.1
qdldl                            0.1.7.post0
referencing                      0.31.0
regex                            2023.10.3
requests                         2.31.0
requests-oauthlib                1.3.1
rfc3339-validator                0.1.4
rfc3986                          2.0.0
rfc3986-validator                0.1.1
rich                             13.7.0
rpds-py                          0.13.1
rsa                              4.9
ruff                             0.1.6
s3transfer                       0.8.1
scikit-learn                     1.2.2
scipy                            1.10.1
scs                              3.2.4.post1
seaborn                          0.13.0
Send2Trash                       1.8.2
setuptools                       69.0.2
shapely                          2.0.2
shellingham                      1.5.4
simpleeval                       0.9.13
six                              1.16.0
sniffio                          1.3.0
soupsieve                        2.5
SQLAlchemy                       1.4.50
stack-data                       0.6.3
statsmodels                      0.14.0
stringcase                       1.2.0
tableschema                      1.20.2
tabulate                         0.9.0
tabulator                        1.53.5
tenacity                         8.2.3
terminado                        0.18.0
text-unidecode                   1.3
threadpoolctl                    3.2.0
timezonefinder                   6.1.10
tinycss2                         1.2.1
tomli                            2.0.1
toolz                            0.12.0
tornado                          6.3.3
traitlets                        5.14.0
typer                            0.9.0
types-python-dateutil            2.8.19.14
typing_extensions                4.8.0
unicodecsv                       0.14.1
uri-template                     1.3.0
urllib3                          2.0.7
validators                       0.22.0
wcwidth                          0.2.12
webcolors                        1.13
webencodings                     0.5.1
websocket-client                 1.6.4
wheel                            0.41.3
xlrd                             2.0.1
XlsxWriter                       3.0.9
yarl                             1.9.3

Review estimate

15min

Future work

Checklist

rouille commented 10 months ago

Quick note on dependencies in the requirements.txt file w.r.t. those in environment.yml:

rouille commented 10 months ago

Thanks for putting this together! A couple of notes/questions:

* The OGE data pipeline does take several hours to run the full thing. In the future, if you want to test running the pipeline, you can use the `--small` command line argument to run the pipeline on a subset of the data to make it go faster. However, you might not want to use this for testing all of the validation/outputs.

Thanks!

* In the past, we've only had the data science team review OGE-related PRs, and not the engineering team. Not sure if Jeff, Ryan, or Brooke need to be asked to review these PRs as they have not interacted much with this code in the past, but perhaps this is something we should discuss whether it makes sense to change going forward.

Sounds good. I thought they could help with packaging the project and review workflows.

* Although we generally use pipenv for our internal projects/repos, we had been using conda for this external/public repo. We may want to have a broader conversation about whether it makes sense to switch and/or for us to maintain multiple environment managers. I think if we move forward with mainaining multiple, we should probably update the environment.yml file as well so that it is consistent with the requirements/pipfile.

I never used Conda and I don't know (yet) how it can be used in GitHub workflows or for packaging on PyPi. Regarding the latter, we can decide to package it for conda with conda-forge but I am not familiar with this process

* As part of the 2022 OGE release, we will need to update some of our pip dependencies, particularly `pudl` and probably `gridemissions`. Per [Update PUDL Dependencies #310](https://github.com/singularity-energy/open-grid-emissions/issues/310), pudl may be disappearing as a software dependency in the future. We may also want to update our dependency on `gridemissions` to a fork from the `singularity` github, rather than Gailin's github. Not sure if it makes sense to update these dependencies first before merging this PR, or whether that matters.

That would make sense

grgmiller commented 10 months ago

Ok I may be the only person on the team who uses conda so it may make sense for me to just switch over at this point, and drop the support for conda environments. I'm not sure if there are others who are using the OGE repo and running it on a conda environment, so not sure if we want to keep support for both temporarily and deprecate it in a future release.