rywhale / kiwisR

Provides a simplified method for bringing tidy data into R from KISTERS WISKI databases via KiWIS API.
Other
12 stars 2 forks source link

issue with return_fields argument for ki_time_series_values() #19

Open beatonan opened 2 years ago

beatonan commented 2 years ago

I am getting an error when using return_fields argument for ki_time_series_values(). here are some examples:

This works:

wiski_vals_tst <- ki_timeseries_values(hub = "swmc", ts_id = 1071852042, start_date = "2021-09-16", end_date = "2021-10-04", return_fields = "station_name")

This does not:

wiski_vals_tst <- ki_timeseries_values(hub = "swmc", ts_id = 1071852042, start_date = "2021-09-16", end_date = "2021-10-04", return_fields = "station_name")

return_fields argument does work with ki_timeseries_list():

This Works:

ts_id_alarm <- ki_timeseries_list(hub = "swmc", station_id = 141370, ts_name = "WWP_AlarmLevel_25 %-tile", return_fields = "station_name, ts_id, ts_name, parametertype_name")

a direct kiwis link for getting metadata fields using get_TimeseriesValues works:

https://www.swmc.mnr.gov.on.ca/KiWIS/KiWIS?service=kisters&type=queryServices&request=getTimeseriesValues&datasource=0&format=html&ts_id=964086042&period=P1d&md_returnfields=station_name,station_no&metadata=true

here is a query for metadata using get_TimeseriesList()...the syntax for getting metadata is a bit different, maybe that is the issue??

https://www.swmc.mnr.gov.on.ca/KiWIS/KiWIS?service=kisters&type=queryServices&request=getTimeseriesList&datasource=0&format=html&station_id=141370&returnfields=station_name,station_no,ts_id,ts_name,parametertype_name,coverage

thanks,

rywhale commented 2 years ago

Sorry for the delay, just saw this- not sure if we discussed it elsewhere already. Like you flagged the time series values query uses a different parameter to specify metadata return fields.

I think your best bet for now is to just do the station list/time series list query with the required return fields and then join it to your ki_timeseries_values query.

I'll leave this issue open flagged as enhancement and look at it when I get some time. For backwards compatibility we'll need to maintain the existing return_fields argument since it controls returning units, etc. and just add md_return_fields as a new optional argument to ki_timeseries_values.

Pull requests welcome 😄

dimfalk commented 2 years ago

Hey guys,

thought I'd sneak a peek here since I'm working with this package right now anyway.

I am not quite sure if I get the issue described - the first example is not working for me and seems to be identical to the second one (?) - but from my point of view ki_time_series_values() works as expected since "station_name" is not a valid parameter for return_fields, c.f. from documentation of the request getTimeseriesValues.

Besides Timestamp and Value by default, you can choose the following optional return_fields:

c("Quality Code", "Quality Code Name", "Quality Code Description",
  "Quality Code Color", "Aggregation Accuracy", "Absolute Value",
  "AV Interpolation Type", "AV Quality Code", "AV Quality Code Name",
  "AV Quality Code Description ", "AV Quality Code Color", "Runoff Value",
  "RV Interpolation Type", "RV Quality Code", "RV Quality Code Name",
  "RV Quality Code Description", "RV Quality Code Color", "Stagesource Value",
  "Occurrance Timestamp", "Occurrance Count", "Timeseries Comment",
  "Agent Comment ", "Station Comment", "Parameter Comment", "Data Comment",
  "Release State")

"station_name" is not an option here when using ki_time_series_values(), but works great for ki_timeseries_list() according to specifications. However, what you can do, is specifying station_name as an optional meta data field to be returned. Within the original getTimeseriesValues this is accomplished using md_returnfields as an optional field. This repository uses the hardcoded character vector ts_meta (I have not figured out yet how to link to specific lines of your code) internally, consisting of "ts_unitname,ts_unitsymbol,ts_name,ts_id,stationparameter_name,station_name,station_id" by default. So you do get station_name as a return, but not as part of the previous return_fields but rather based on an additionally specified meta data query, which is kind of invisible to the end user.

This would totally explain why your request fails with return_fields = "station_name" specification, but succeeds (and also leads to the desired result) when omitting.

So basically there is no fix needed imho, but md_return_fields could be added as another optional parameter for ki_timeseries_values, sure. On the other hand, this station-related meta data most likely has already been retrieved using ki_timeseries_list in the previous step and could simply be joined. But I do not see a win here since all this information is static and is simply replicated over the length of the timeseries in the tibble returned, contradicting the concept of data economy in a way.

rywhale commented 2 years ago

@falk-env yep you're understanding the issue perfectly. Basically just need to add the md_returnfields paramter as an optional argument and add it to the ki_timeseries_values query. Personally low priority for me since you can just join with the time series list or station list metadata as you described.