Improve readUCI & readWDM for a broader range of valid files

aufdenkampe commented 3 years ago

This spring @steveskrip noticed that many UCI files successfully used by LimnoTech with HSPF (and created by LimnoTech's WinModel package) would not import with readUCI.

@rheaphy also noted that there might be time issues in UCI files, because HSPF doesn't really correctly manage time and for HSP2, we're using ISO time standards that track leap seconds and time zones.

Let's use this issue thread to track @rheaphy's work to improve readUCI, and our results with testing it.

aufdenkampe commented 3 years ago

Copying from https://github.com/LimnoTech/HSPsquared/issues/9#issuecomment-697012154 for our records.

From May 19 "HSP2 test files from LimnoTech" email from @steveskrip to @rheaphy, and my response, for our records:

I’m still having a hard time getting LimnoTech’s HSPF .uci files parsed through the readUCI function. I did get my python path plumbing in order, so thanks for your help with that.

I think it might be best if I send the files over your way to have a look. I imagine there are just some minor differences in string formatting that aren’t being handled. If you do notice small changes that can be made on the .uci file side, we can do that to get things moving, but I know the goal here is to handle any HSPF .uci file.

The files are in LimnoTech’s tests branch of the GitHub repository (https://github.com/LimnoTech/HSPsquared/tree/develop/tests). The two tests are GRW_Plaster and ZRW_WestIndian. Let me know if you’d like me to send them over to you another way.

My emailed response:

I’ve issued a pull request to Bob for all of Steve test files. See https://github.com/respec/HSPsquared/pull/34. Bob, once you review and merge this PR, then you’ll be able to work with Steve’s files in your develop branch.

Bob merged PR https://github.com/respec/HSPsquared/pull/34 into their develop on May 22. See the PR conversation for some additional details.

In Bob's June 2 "HSP2 Status" email, he writes:

Last week I finished fixing the known issues with the UCI reader - but looking at the GRW_Plaster UCI file, I found tables that I had not previously found in the my other test cases. On trying to make a quick fix, I found that the fix was too complicated for long term maintenance. I rewrote a section of the code and testing has gone smoothly.

I plan to release the new version in a few hours.

From June 2-13, Bob made three commits that refactored readUCI. For the list, see https://github.com/respec/HSPsquared/pull/41.

@steveskrip, let's confirm that these fixes work for us. I merged all these updates into https://github.com/LimnoTech/HSPsquared.

aufdenkampe commented 3 years ago

@rheaphy, it looks like @steveskrip discovered some additional issues when trying to run the standard HSPF tests that @PaulDudaRESPEC suggested in his comment to respect #31: Expand & automate testing system!

@steveskrip provides detailed information in https://github.com/LimnoTech/HSPsquared/issues/16.

You'll see that the issue also includes problems with readHBN.

bcous commented 3 years ago

I'm getting an error with reading in this UCI file: https://github.com/LimnoTech/HSPsquared/blob/develop-WaterQuality-BC/tests/GLWACSO/GLWA_HSPF_June2019_Mon8MileDataFilled_WT_RW_v4.UCI

Here's a copy of the error message:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-7-e0f78d821958> in <module>
----> 1 HSP2tools.readUCI(uciname, HDFname)

~\Documents\GitHub\limno_HSPsquared\HSP2tools\readUCI.py in readUCI(uciname, hdfname)
    122             if line[0:3] == 'EXT':              ext(info, getlines(f))
    123             if line[0:6] == 'PERLND':     operation(info, getlines(f),'PERLND')
--> 124             if line[0:6] == 'IMPLND':     operation(info, getlines(f),'IMPLND')
    125             if line[0:6] == 'RCHRES':     operation(info, getlines(f),'RCHRES')
    126 

~\Documents\GitHub\limno_HSPsquared\HSP2tools\readUCI.py in operation(info, llines, op)
    374             history[dpath[op,table],dcat[op,table]].append((table,df))
    375 
--> 376     (_,df) = history['GENERAL','INFO'][0]
    377     valid = set(df.index)
    378     for path,cat in history:

IndexError: list index out of range

bcous commented 3 years ago

I'm getting errors with reading in this WDM file: https://github.com/LimnoTech/HSPsquared/blob/develop-WaterQuality-BC/tests/GLWACSO/KDTWMet-06272019-KOS_w_Mon17Filled_CHLA_ComDO.wdm

PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 134
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 135
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 136
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 137
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 138
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 139
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 147
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 147
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 134
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 135
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 136
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 137
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 138
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 139
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 134
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 135
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 136
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 137
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 138
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 139
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 134
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 135
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 136
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 137
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 138
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 139
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 134
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 135
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 136
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 137
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 138
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 139
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 134
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 135
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 136
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 137
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 138
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 134
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 135
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 136
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 137
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 138
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 139
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 147
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
PROGRAM ERROR: ATTRIBUTE INDEX not found 286 Attribute pointer 140
PROGRAM ERROR: ATTRIBUTE INDEX not found 287 Attribute pointer 141
PROGRAM ERROR: ATTRIBUTE INDEX not found 13 Attribute pointer 142
PROGRAM ERROR: ATTRIBUTE INDEX not found 12 Attribute pointer 143
PROGRAM ERROR: ATTRIBUTE INDEX not found 14 Attribute pointer 144
PROGRAM ERROR: ATTRIBUTE INDEX not found 15 Attribute pointer 145
PROGRAM ERROR: ATTRIBUTE INDEX not found 16 Attribute pointer 146
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
    563     try:
--> 564         columns = _validate_or_indexify_columns(content, columns)
    565         result = _convert_object_array(content, dtype=dtype, coerce_float=coerce_float)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\construction.py in _validate_or_indexify_columns(content, columns)
    688             raise AssertionError(
--> 689                 f"{len(columns)} columns passed, passed data had "
    690                 f"{len(content)} columns"

AssertionError: 9 columns passed, passed data had 10 columns

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-11-a67c96be9d33> in <module>
----> 1 HSP2tools.readWDM('KDTWMet-06272019-KOS_w_Mon17Filled_CHLA_ComDO.wdm', HDFname)

~\Documents\GitHub\limno_HSPsquared\HSP2tools\readWDM.py in readWDM(wdmfile, hdffile)
    118 
    119 
--> 120         dfsummary = pd.DataFrame(summary, index=summaryindx, columns=columns)
    121         store.put('TIMESERIES/SUMMARY',dfsummary, format='t', data_columns=True)
    122     return dfsummary

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    507                     if is_named_tuple(data[0]) and columns is None:
    508                         columns = data[0]._fields
--> 509                     arrays, columns = to_arrays(data, columns, dtype=dtype)
    510                     columns = ensure_index(columns)
    511 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\construction.py in to_arrays(data, columns, coerce_float, dtype)
    522         return [], []  # columns if columns is not None else []
    523     if isinstance(data[0], (list, tuple)):
--> 524         return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype)
    525     elif isinstance(data[0], abc.Mapping):
    526         return _list_of_dict_to_arrays(

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
    565         result = _convert_object_array(content, dtype=dtype, coerce_float=coerce_float)
    566     except AssertionError as e:
--> 567         raise ValueError(e) from e
    568     return result, columns
    569 

ValueError: 9 columns passed, passed data had 10 columns

PaulDudaRESPEC commented 3 years ago

@aufdenkampe @steveskrip @bcous I just checked in some refinements to the developWaterQuality branch that I believe resolves issues with reading UCI, WDM, and HBN files -- you might want to try them out!

aufdenkampe commented 3 years ago

@PaulDudaRESPEC, thank you!

@steveskrip & @bcous, I merged all this into LimnoTech's develop-WaterQuality and develop-WaterQuality-BC branches.

Unfortunately, I had a merge conflict when I tried to cherry-pick the individual commit into our develop branches.

@PaulDudaRESPEC, since we all just decided to focus on Water Quality modules, I'm wondering if it's time we merge all WaterQuality into develop and then delete the develop-WaterQuality branch. That would simplify the git tracking substantially for me (and all of us). What do you think?

PaulDudaRESPEC commented 3 years ago

@aufdenkampe , I'm on board with having only one development branch during this current effort.

bcous commented 3 years ago

I have tested this with the same files as yesterday, and it appears to be working better. readUCI completed with no problem on the file I linked yesterday. There may still be an issue with the readWDM. It reads in 3 of the files correctly, but appears to hang up on this WDM: https://github.com/LimnoTech/HSPsquared/blob/develop-WaterQuality-BC/tests/GLWACSO/RPO_SWMM48LINKS2017_wCBOD_June2019.wdm

When running in Jupyter notebooks, it never completes. It appears to add timeseries to the .h5 file (file is larger after it starts running), but it never updates the summary table. Let me know if you want to see the .h5 files and I can find a way to transfer them to you.

aufdenkampe commented 3 years ago

Thanks @bcous for testing again! That great news that the readUCI worked on your file! @steveskrip, could you test those other files? (but do it from the develop-WaterQuality branch)

It's good to know that the readWDM did better. @PaulDudaRESPEC, any ideas?

Tomorrow morning, I'll work with @PaulDudaRESPEC to merge water quality into develop.

bcous commented 3 years ago

Hi @PaulDudaRESPEC --

I was chatting with @aufdenkampe about this issue. He suggested that it might be related to the 15-minute data in the WDM file. I checked and in at least 2 of the other WDMs that were read in there were timeseries with 15-minute flow data included as well. Let me know if you want to chat about specifics further.

PaulDudaRESPEC commented 3 years ago

Thanks, @bcous , that's good to know. I've asked Jack, the WDM guru, to take a look.

PaulDudaRESPEC commented 3 years ago

Circling back to this one... Jack took a look and noted that at least one of the problematic data sets, DSN 772, appears to have been compiled at various time steps -- daily, 15min, and annual, all in the same timeseries. Looks like the old WDM Fortran code knows how to deal with that, but not the python code. Until we have a fix, I suggest a work-around might be to build the data set from scratch at a 15min time step throughout.

aufdenkampe commented 3 years ago

@PaulDudaRESPEC, that is very helpful to know. Thank you!

aufdenkampe commented 3 years ago

@PaulDudaRESPEC, any updates on whether you or Jack might be able to fix readWDM.py to read files with many different time intervals? We're trying to pick up an old HSPF model created by others, so we can't rebuild the files from scratch.

PaulDudaRESPEC commented 3 years ago

Jack is looking at it. I think he's on the trail, but we've haven't solved it yet.

My thought about rebuilding the files from scratch is that you could list the problematic timeseries in something like the SARA Timeseries Utility, save the list to a text file, and then re-import the data from the text file. But I'm not sure if you'd lose anything critical in the process.

PaulDudaRESPEC commented 3 years ago

@aufdenkampe and @bcous I know it has been a while since we've provided any news on this issue. Jack is continuing to work on it. This morning he committed a change to readWdm.py (in the develop branch) -- this new version definitely helps, but we're not sure it totally solves the issue, more testing is in order. As a general explanation of what's going on, it looks like there's a compression functionality in the Fortran WDM code that wasn't implemented in the python port -- WDM files that use that functionality are much much larger when converted to HDF5 files -- perhaps underappreciated design elements of that old code!

aufdenkampe commented 3 years ago

@PaulDudaRESPEC, thanks for the update, and thanks to you and @jlkittle for your first round of fixes with dddd759681bb28fce611e82eede470ec4945244c and f190fd8c3067a32245e3d363379b3844617a96bf!

That's really interesting to hear that its connected to different compression routines in the Fortran WDM code. We noticed with @bcous's project that those WDM files created massively bigger HDF5 files. I've been thinking that we might be able to do better with the HDF5 compression. In fact, the last work by @rheaphy including exploring better HSP2 performance by using BLOSC compression with the HDF5 files, as he described here: https://github.com/respec/HSPsquared/issues/36#issuecomment-697107682. It might be useful to pick up where he left off.

bcous commented 3 years ago

I was trying to use readUCI on this file: https://github.com/LimnoTech/HSPsquared/blob/develop-WaterQuality-BC/tests/GLWACSO/model_files/GLWA_HSPF_June2019_Mon8MileDataFilled_WT_RW_v4.UCI

The following error messages came up when I tried to run it.

Thanks,

Brendan


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-b83acb602a13> in <module>
----> 1 get_ipython().run_line_magic('timeit', 'HSP2tools.readUCI(uciname, HDFname)')

~\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth)
   2325                 kwargs['local_ns'] = self.get_local_scope(stack_depth)
   2326             with self.builtin_trap:
-> 2327                 result = fn(*args, **kwargs)
   2328             return result
   2329 

<decorator-gen-54> in timeit(self, line, cell, local_ns)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\magics\execution.py in timeit(self, line, cell, local_ns)
   1167             for index in range(0, 10):
   1168                 number = 10 ** index
-> 1169                 time_number = timer.timeit(number)
   1170                 if time_number >= 0.2:
   1171                     break

~\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\magics\execution.py in timeit(self, number)
    167         gc.disable()
    168         try:
--> 169             timing = self.inner(it, self.timer)
    170         finally:
    171             if gcold:

<magic-timeit> in inner(_it, _timer)

~\Documents\GitHub\limno_HSPsquared\HSP2tools\readUCI.py in readUCI(uciname, hdfname)
    143             if line[0:6] == 'PERLND':     operation(info, getlines(f),'PERLND')
    144             if line[0:6] == 'IMPLND':     operation(info, getlines(f),'IMPLND')
--> 145             if line[0:6] == 'RCHRES':     operation(info, getlines(f),'RCHRES')
    146 
    147         colnames = ('AFACTR', 'MFACTOR', 'MLNO', 'SGRPN', 'SMEMN', 'SMEMSB',

~\Documents\GitHub\limno_HSPsquared\HSP2tools\readUCI.py in operation(info, llines, op)
    566                         df = concat([temp[1] for temp in history[path, cat]], axis='columns')
    567                         df = fix_df(df, op, path, ddfaults, valid)
--> 568                         df.to_hdf(store, f'{op}/{path}/{cat}{count}', data_columns=True)
    569             else:
    570                 print('UCI TABLE is not understood (yet) by readUCI', op, cat)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
   2447             data_columns=data_columns,
   2448             errors=errors,
-> 2449             encoding=encoding,
   2450         )
   2451 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
    268             path_or_buf, mode=mode, complevel=complevel, complib=complib
    269         ) as store:
--> 270             f(store)
    271     else:
    272         f(path_or_buf)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py in <lambda>(store)
    260             data_columns=data_columns,
    261             errors=errors,
--> 262             encoding=encoding,
    263         )
    264 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py in put(self, key, value, format, index, append, complib, complevel, min_itemsize, nan_rep, data_columns, encoding, errors, track_times)
   1127             encoding=encoding,
   1128             errors=errors,
-> 1129             track_times=track_times,
   1130         )
   1131 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py in _write_to_group(self, key, value, format, axes, index, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, nan_rep, data_columns, encoding, errors, track_times)
   1799             nan_rep=nan_rep,
   1800             data_columns=data_columns,
-> 1801             track_times=track_times,
   1802         )
   1803 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, nan_rep, data_columns, track_times)
   4236             min_itemsize=min_itemsize,
   4237             nan_rep=nan_rep,
-> 4238             data_columns=data_columns,
   4239         )
   4240 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py in _create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize)
   3863 
   3864         blocks, blk_items = self._get_blocks_and_items(
-> 3865             block_obj, table_exists, new_non_index_axes, self.values_axes, data_columns
   3866         )
   3867 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py in _get_blocks_and_items(block_obj, table_exists, new_non_index_axes, values_axes, data_columns)
   3986             blk_items = get_blk_items(mgr, blocks)
   3987             for c in data_columns:
-> 3988                 mgr = block_obj.reindex([c], axis=axis)._mgr
   3989                 blocks.extend(mgr.blocks)
   3990                 blk_items.extend(get_blk_items(mgr, mgr.blocks))

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    307         @wraps(func)
    308         def wrapper(*args, **kwargs) -> Callable[..., Any]:
--> 309             return func(*args, **kwargs)
    310 
    311         kind = inspect.Parameter.POSITIONAL_OR_KEYWORD

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in reindex(self, *args, **kwargs)
   4030         kwargs.pop("axis", None)
   4031         kwargs.pop("labels", None)
-> 4032         return super().reindex(**kwargs)
   4033 
   4034     def drop(

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs)
   4460         # perform the reindex on the axes
   4461         return self._reindex_axes(
-> 4462             axes, level, limit, tolerance, method, fill_value, copy
   4463         ).__finalize__(self, method="reindex")
   4464 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   3871         if columns is not None:
   3872             frame = frame._reindex_columns(
-> 3873                 columns, method, copy, level, fill_value, limit, tolerance
   3874             )
   3875 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _reindex_columns(self, new_columns, method, copy, level, fill_value, limit, tolerance)
   3919             copy=copy,
   3920             fill_value=fill_value,
-> 3921             allow_dups=False,
   3922         )
   3923 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   4528                 fill_value=fill_value,
   4529                 allow_dups=allow_dups,
-> 4530                 copy=copy,
   4531             )
   4532             # If we've made a copy once, no need to make another one

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\managers.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy, consolidate)
   1274         # some axes don't allow reindexing with dups
   1275         if not allow_dups:
-> 1276             self.axes[axis]._can_reindex(indexer)
   1277 
   1278         if axis >= self.ndim:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in _can_reindex(self, indexer)
   3287         # trying to reindex on an axis with duplicates
   3288         if not self.is_unique and len(indexer):
-> 3289             raise ValueError("cannot reindex from a duplicate axis")
   3290 
   3291     def reindex(self, target, method=None, level=None, limit=None, tolerance=None):

ValueError: cannot reindex from a duplicate axis

PaulDudaRESPEC commented 3 years ago

@bcous , just posted a fix for UCI's with multiple GQUALs -- fixes the problem you reported yesterday.

bcous commented 3 years ago

Doing additional testing and ran into an error in running HSP2.main. Error codes listed below:

2021-03-08 11:16:00.66   Processing started for file GLWA_HSPF_June2019_Mon8MileDataFilled_WT_RW_v4.h5; saveall=True
2021-03-08 11:16:02.67   Simulation Start: 2017-05-01 00:00:00, Stop: 2017-11-01 00:00:00
2021-03-08 11:16:02.67      PERLND P301 DELT(minutes): 15
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-13-63be5facfe88> in <module>
----> 1 HSP2.main(hdfname,saveall=True)

~\Documents\GitHub\limno_HSPsquared\HSP2\main.py in main(hdfname, saveall, jupyterlab)
     49 
     50             # now conditionally execute all activity modules for the op, segment
---> 51             ts = get_timeseries(store,ddext_sources[(operation,segment)],siminfo)
     52             flags = uci[(operation, 'GENERAL', segment)]['ACTIVITY']
     53             if operation == 'RCHRES':

~\Documents\GitHub\limno_HSPsquared\HSP2\main.py in get_timeseries(store, ext_sourcesdd, siminfo)
    204         if row.MFACTOR != 1.0:
    205             temp1 *= row.MFACTOR
--> 206         t = transform(temp1, row.TMEMN, row.TRAN, siminfo)
    207 
    208         tname = f'{row.TMEMN}{row.TMEMSB}'

~\Documents\GitHub\limno_HSPsquared\HSP2\utilities.py in transform(ts, name, how, siminfo)
     78         pass
     79     elif tsfreq == None:     # Sparse time base, frequency not defined
---> 80         ts = ts.reindex(siminfo['tbase']).ffill().bfill()
     81     elif how == 'SAME':
     82         ts = ts.resample(freq).ffill()  # tsfreq >= freq assumed, or bad user choice

KeyError: 'tbase'

aufdenkampe commented 3 years ago

@PaulDudaRESPEC and @bcous, the tbase error that @bcous shared in the previous comment was introduced by our work in our develop-readWDM branch as described in https://github.com/LimnoTech/HSPsquared/issues/21.

The issue has since been fixed, but we found yet another issue in that branch that we are presently working on fixing.

aufdenkampe commented 3 years ago

With the recent successful Rewrite readWDM.py to read by data group & block #21, we can properly read all WDM files that we've tested, including those with irregular time series.

All other readUCI issue have been addressed, to our knowledge.

Getting HSP2 to Handle irregular time series input #51 is a separate issue

Closing this issue as we will merge PR #35 (Merge develop_readWDM into develop to read time series by block & group #35) as soon as we resolve a merge conflict.

rheaphy commented 2 years ago

Hi, The reason I didn't implement compression originally was that HDFView and other third party tools required "registration" of compression algorithms which was so poorly documented that I thought this would be hard for most hydrologists. I expected that the improvements to HDFView would make this either easy or automatic. I didn't want people frustrated that they couldn't view their HDF5 files with standard tools. I have been tracking the HDF tools created for JupyterLab but their progress has been slow. Compression is easy using Pandas/pytables. Bob

On Fri, Jan 22, 2021 at 2:11 PM Anthony Aufdenkampe < @.***> wrote:

@PaulDudaRESPEC https://github.com/PaulDudaRESPEC, thanks for the update, and thanks to you and @jlkittle https://github.com/jlkittle for your first round of fixes with dddd759 https://github.com/respec/HSPsquared/commit/dddd759681bb28fce611e82eede470ec4945244c and f190fd8 https://github.com/respec/HSPsquared/commit/f190fd8c3067a32245e3d363379b3844617a96bf !

That's really interesting to hear that its connected to different compression routines in the Fortran WDM code. We noticed with @bcous https://github.com/bcous's project that those WDM files created massively bigger HDF5 files. I've been thinking that we might be able to do better with the HDF5 compression. In fact, the last work by @rheaphy https://github.com/rheaphy including exploring better HSP2 performance by using BLOSC compression with the HDF5 files, as he described here: #36 (comment) https://github.com/respec/HSPsquared/issues/36#issuecomment-697107682. It might be useful to pick up where he left off.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/respec/HSPsquared/issues/40#issuecomment-765686350, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFML2EKVJZ3Z22P7VDRUWFDS3HSZJANCNFSM4RWIVM6A .

respec / HSPsquared

Improve readUCI & readWDM for a broader range of valid files #40