After downloading PUDL, we currently add a new pudl_version.txt file inside with a version string. Future downloads check that version to decide if PUDL needs to be re-downloaded or can be skipped.
When I ran the data pipeline from scratch, my PUDL download wasn't being detected. The issue had to do with newline characters in the text file. This may have been caused by me editing the text file manually, but we should remove newlines to be safe.
Proposed fix:
# File: download_data.py
# Relevant function: download_pudl_data(zenodo_url):
# Current code:
existing_version = f.readlines()[0]
# Removing newline characters fixed the version comparison for me:
existing_version = f.readlines()[0].replace('\n', '')
pudl_version.txt
file inside with a version string. Future downloads check that version to decide if PUDL needs to be re-downloaded or can be skipped.Proposed fix: