msmasnadi / OPGEEv4

OPGEE v4
Other
10 stars 4 forks source link

Add units into the name of every numerical column in output files #7

Closed lloyd-rmi closed 2 months ago

lloyd-rmi commented 2 months ago

It is difficult to always track what units the output data is in. Having them explicitly added into the name of each column would make it simple for us to always know what numbers we are dealing with.

wennanlong commented 2 months ago

Hi lloyd-rmi, Thank you for the feedback. I’ve added the units to all the output file names for clarity. The carbon intensity is now labeled in g/MJ, emissions in tonnes/day, energy in MMBtu/day, and gases in tonnes/day. The streams file includes units within the file itself.

rjplevin commented 2 months ago

Two issues with this approach:

  1. It's not guaranteed that all items in a file have the same units. Naming the file this way suggests otherwise. Adding units to columns, or perhaps generating an additional file with data -> unit mappings, would be more robust. (For example, I believe electricity, in the energy structure, is in kWh/day, not mmBtu/day.)
  2. Hard-coding the units creates a maintenance issue of keeping filename and units aligned. If we decide to name the file with units (not recommended) then we should compute the name from the actual units in the file.

For example:

import re
units = str(Emissions.units())  # => 'metric_ton / day'
filename = "emissions-" + re.sub('\s*/\s*', "_per_", units) + ".csv" # => 'emissions-metric_ton_per_day.csv'
lloyd-rmi commented 2 months ago

The desire for this issue was to have the units tied to each column output within each file, rather than the file as a whole. Would request that we make that change so re-opening the issue.

Agreed that avoiding hard coded units would be a less brittle approach as well.