singularity-energy / open-grid-emissions

Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
MIT License
75 stars 5 forks source link

Improve relative path handling so that other code can use Open Grid Emissions #166

Closed miloknowles closed 2 years ago

miloknowles commented 2 years ago

Currently, relative file paths are handled through the PATH_TO_LOCAL_REPO. However, PATH_TO_LOCAL_REPO only works if the code that's using it is located in src and the user's clone of the repo has an up-to-date folder name. Issue (2) isn't a big deal but would come up if you later renamed the remote repo to something else (e.g the recent renaming). I'm mostly concerned about (1), where I'm trying to write a script that imports things from Open Grid Emissions, but it doesn't belong in your repo.

I would suggest resolving paths as follows in load_data.py:

def top_folder(rel=''):
    """
    Returns a path relative to the top-level repo folder.

    This will work regardless of where the function is imported or called from.
    """
    return os.path.join(os.path.abspath(os.path.join(os.path.realpath(__file__), "../../")), rel)

# Could also make these for convenience:

def data_folder(rel=''):
    """Returns a path relative to the `data` folder."""
    return os.path.join(top_folder('data'), rel)

def downloads_folder(rel=''):
    return os.path.join(data_folder('downloads'), rel)

def manual_folder(rel=''):
    return os.path.join(data_folder('manual'), rel)

# The quick fix is to just set PATH_TO_LOCAL_REPO here:
PATH_TO_LOCAL_REPO = top_folder()

The nice thing about this pattern is that only have to update paths in one function if the repo is ever refactored (as opposed to the many functions that have hardcoded paths right now).

@gailin-p @grgmiller happy to make this change on a branch, but I figured it might be easiest for one of you to copy-paste it in with one of you upcoming merges (if it seems ok). Let me know what you think.

grgmiller commented 2 years ago

Thanks for this suggestion. It sounds like we need the relative path handling to work in several different situations:

  1. Running the data pipeline from a terminal window
  2. Importing/running code from any of the jupyter notebooks
  3. Importing OGE functions to other scripts outside of the repo.

I think @gailin-p was actually working on some fixes to this anyway, so probably makes sense for her to implement.

gailin-p commented 2 years ago

Yep, makes sense, I'll use this pattern