KeyError Handling needed for IOManager

respec / HSPsquared

Hydrologic Simulation Program Python (HSPsquared)

GNU Affero General Public License v3.0

43 stars 17 forks source link

KeyError Handling needed for IOManager #77

Closed ptomasula closed 2 years ago

ptomasula commented 2 years ago

@sjordan29's testing in PR #73 revealed that requesting a timeseries from the IOManager class that does not exist results in a KeyError causing the application to stop executing. I think the behavior should be to return an empty pandas.DataFrame and possibly raising a warning, but not stop execution.

ptomasula commented 2 years ago

I resolved the unhandled KeyError that caused model execution to stop with commit fac5596. That addressed the immediate problem related to this issue.

However, this is not fully resolved. The issue was discovered when executing the test10 with the saveall argument set to false. In that run, the PWTGAS module requested a result timeseries that was not saved to the output file. The request timeseries was generated by a module further up in the operation sequence but, because the UCI indicates it does not need to be output, the timeseries never makes it to the IOManager for caching.

I propose we implement some type of caching even when timeseries are not to be saved to output. I think it's probably easiest to modify the save_timeseries function to pass all timeseries to the IOManager. We then need to modify the SupportsWriteTS to accept a boolean argument indicating if the timeseries should be cached and saved, or just cached. I'll work on implementing that next week.

ptomasula commented 2 years ago

I've addressed this issue by modifying the save_timeseries function. This new approach preserves all of the columns in the dataframe instance and passes the dataframe to the write_ts method of IOManager. However, save_timeseries also now passes a list of the columns that should be exported.

IOManager has also been updated to except this list of columns to export. The dataframe instance with all columns is first cached to memory, and only then are columns not in the export list selectively dropped from the dataframe and then exporting to the output file. The result is that later modules can still reference a parameter that was output from earlier modules (by accessing the cached dataframe), even if that parameter was not exported to the output file.

aufdenkampe commented 2 years ago

@ptomasula, that sounds like a the perfect approach, to have the IO Manager keep all the data frames in memory until the model is complete.

Thanks for issuing a PR to get your commit 6026fa8 from your feature branch into develop.

aufdenkampe commented 2 years ago

Tutorials updated with 5ca8619.

The notebooks now recommend using a context manager for running HSP2.