ulmo-dev / ulmo

clean, simple and fast access to public hydrology and climatology data.
http://ulmo.readthedocs.org
Other
169 stars 63 forks source link

Skip bad lines in CDEC get_stations; and ghcn_daily test failure fix #214

Closed rebecca-palmer closed 1 year ago

rebecca-palmer commented 1 year ago

(Warning: this has not been tested - the CI says "This workflow is awaiting approval from a maintainer".)

test/cdec_historical_test.py::get_stations fails in pandas 1.4+ because the file contains lines with too many fields.

(I'm not sure why this is a new bug: the closest I can find is pandas-dev/pandas#22144 which isn't an exact match.)

This fix drops the invalid lines, because trying to make sense of them would produce nonsense if the extra commas aren't at the line end.

emiliom commented 1 year ago

Thanks for this contribution @rebecca-palmer ! I approved the CI, and some tests are failing. I haven't looked closely, but failure doesn't necessarily mean that it's your changes that are causing the problems. I wouldn't be surprised if other tests are failing.

I'll try to take a closer look soon.

rebecca-palmer commented 1 year ago

The first failure is mine: dropping the bad lines took the station count below the minimum the test expects. Fixed. (The file doesn't open in my browser, so I don't know what the bad lines actually say.)

The others look like they're unrelated problems due to data sources disappearing/changing, though I don't know why Debian isn't also seeing those failures. The obvious way to check would be to re-run the CI on main (or a do-nothing 'pull request' if that's easier).

rebecca-palmer commented 1 year ago

That worked.

I don't know why Debian isn't also seeing those failures

Debian is now seeing the 2 ghcn_daily_test.py failures - they only started failing last month and I hadn't looked since then. These 2 look like the data being checked against has changed; updated our reference to match.

The last two (which are in tests Debian doesn't run) look like the data source is either no longer available (at that address) or no longer has that item. I haven't investigated those further.

rebecca-palmer commented 1 year ago

Does prolonged silence mean there's a problem with this change?

The remaining test failures (unrelated to this) look like another change to the source data, and a source that only keeps data available for a limited time (and also has a "do not redistribute" notice, which might mean that test isn't allowed to exist at all).

emiliom commented 1 year ago

3 months later ... Sigh. My long silence just meant that I had not gotten to this. I'm making a push in these few days to issue a new ulmo release.

I've tested your PR locally, and all the CDEC tests run fine. They issue 3 warnings, but none are related to your PR (BTW, I'm fixing one of those warnings).

So, I'll merge this PR. Thanks for your contribution and your patience!

emiliom commented 1 year ago

I just realized that this PR also ended up addressing ghcn_daily test failure! Adding this comment here to make this more explicit, for future reference.