google / weather-tools

Tools to make weather data accessible and useful.
https://weather-tools.readthedocs.io/
Apache License 2.0
209 stars 40 forks source link

Config values need to be typed on parsing. #5

Closed alxmrs closed 2 years ago

alxmrs commented 2 years ago

The current config-value parser treats all values as strings. Instead, we need to try to infer types in a best-effort way. This is a pre-requisite step in order to fix #4.

fredzyda commented 2 years ago

Perhaps consider using protobuf for this? It is easy enough to write textproto files and then you get all the useful parsing that protobuf contains by default.

fredzyda commented 2 years ago

To be more specific: consider using a textproto for this. Protobuf can be binary or text, but for configuration, text is much better and can be easily edited with a text editor.

alxmrs commented 2 years ago

I have a few hesitations with using textprotos. Though, I'm open to using them if I can address my concerns.

  1. I'd like to support multiple file types for serialization. Right now we support JSON and ConfigParser. I could see textprotos fitting in as another format we support, but I'd prefer not to replace all formats with textprotos. Thus, I think there will be a need for type checking. I don't want to deprecate the old formats for configuring downloads if I don't have to. https://github.com/google/weather-tools/blob/ecb194c7bc62d9d0ad3982cce24eb745a470be84/weather_dl/download_pipeline/parsers.py#L72

  2. To my knowledge, protobufs don't support datetime values natively, where this is an important datatype for our parser to support. They do support Timestamps and int64s, but I would argue that entering ranges of dates in this format would provide a worse user experience than writing ISO-like date strings.

  3. There is some sugar I would like to provide no matter the format the data is in. For example: https://github.com/google/weather-tools/blob/ecb194c7bc62d9d0ad3982cce24eb745a470be84/weather_dl/download_pipeline/parsers.py#L133 We could just use protos for this, but it would require a lot of repetition on the textproto writers side. This could be automated with a script / code, but then we might as well put this logic as syntax in the format. Indeed, ranges of values come up a lot on our config file, so this sugar / syntax is really important to keep for the user experience.

Maybe there's a way to support both protos and this range syntax? I have an outstanding bug here to do this for the JSON implementation: https://github.com/google/weather-tools/blob/ecb194c7bc62d9d0ad3982cce24eb745a470be84/weather_dl/download_pipeline/parsers.py#L75 In any case, I do think that if we adopted protos, then we'd still want to add some rudimentary type paring capabilities.

As an implementation note: I think that type checking ought to be in a separate method and be called right after parsing the underlying format type, about here: https://github.com/google/weather-tools/blob/ecb194c7bc62d9d0ad3982cce24eb745a470be84/weather_dl/download_pipeline/parsers.py#L261 https://github.com/google/weather-tools/blob/ecb194c7bc62d9d0ad3982cce24eb745a470be84/weather_dl/download_pipeline/parsers.py#L263

alxmrs commented 2 years ago

A simple way of replacing the string representation of values with literals is to make use of this method from ast:

https://docs.python.org/3/library/ast.html#ast.literal_eval

The motivation for typing data is mentioned in #4. It's really only necessary for creating the name of the output file; in fact, each downloader client is totally fine accepting metadata as strings and downloading them (they'll accept typed values, too, but don't need to). In reviewing this issue and it's siblings, it occurs to me that another approach for achieving #4 would be to type values "just-in-time" for string formatting. By this, I mean that we could have a method that handles MARS types (as defined in https://github.com/google/weather-tools/issues/118#issuecomment-1088151645) from strings, and we invoke that method here: https://github.com/google/weather-tools/blob/cdd313239633efee83c8fa4e35d72c7364b984ff/weather_dl/download_pipeline/parsers.py#L350

alxmrs commented 2 years ago

Fixed in #144.