hyriver / pygeohydro

A part of HyRiver software stack for accessing hydrology data through web services
https://docs.hyriver.io
Other
68 stars 23 forks source link

Theoretical meta source class #29

Closed aaraney closed 4 years ago

aaraney commented 4 years ago

To preface this, I don't expect this to be pulled in, it was more of a way to showcase an idea and get us thinking. Admittedly, this could be unnecessary for a project of this size where there are 5-10 different datasources, so just no that is too much boilerplate code is also a fine response lol.

So, with that being said, I went through datasets.py and it seems like there is a little bit of code redundancy or maybe just thought redundancy? The platform you have built seems to have a ton of potential to have additional datasets added, so with that in mind I wrote a sort of meta class for sources.

Purpose

Set expectations for future collaborators and provide a framework for adding in other datasets. Basically, let's build a framework for what a source should have and the methods it must have.

This class would be inherited by all other source classes and can either use or override the existing methods in the meta source class to keep a consistent namespace.

/sources/__init__.py can be used to control the public API that a user will interface with.

Known problems/limitations

All datasources are not created equal, or at least they may have differing kinds of data (gridded, point, tabular, etc.) so creating a meta class might need to be divided into a structured hierarchy, where there is a root meta class that is inherited by two to three provider type classes which each inherit the root meta class.

For example, the root meta class would have things that all providers have, their name, url, citation, base_url a __str__ method, plus any other general attributes all sources WILL have. Then a meta gridded class could hold class methods like bybox and bygeom and this class would be inherited by gridded data sources. It should be noted that sources that support multiple data types could utilize multi inheritance and inherit two or more meta classes (i.e. meta gridded and meta tabular).

What I did

The DataSource class is the meta class if you will. There are two examples of using DataSource, the ssebopeta will probably be a more interesting example. I didn't test the bygeom class method, but I think it should work or be very close. Just mainly wanted to showcase the idea more than anything else.

cheginit commented 4 years ago

Thanks for the PR. I think it's a good idea and I appreciate your effort in implementing it. Also, type hinting is something that I've been wanting to add as well, I even added mypy to pre-commit config file for this specific reason. So it's definitely on the to-do list.

Regarding the metaclass, It was something that I had in mind as well but didn't spend time on it since I noticed other than nuances in configuring the server, each service might need different post-processing for creating the output data array. So from the development point of view, in the long run, adding more and more databases directly might not be sustainable as it might add more and more libraries and requirements. So instead, I decided to add classes like ArcGISServer and USGSGeoserver here. This allows the users to add databases they need using these base classes and do the requests.Response post-processing either using the available functions like create_dataset in hydrodata or any other specific libraries that are needed. So essentially I am more inclined to provide the user an interface for adding a database using hydrodata as a library rather than offering the database itself as in teaching fishing rather than giving them a fish!

That being said I think your idea is along the same line but perhaps could be implemented in services with what I mentioned above in mind. What do you think?

A fun fact! I didn't intend to add SSEBop to hydrodata since its implementation is very different than the other services and it's not a very widely used database as well. But I add it anyway since implementing the function efficiently was kind of challenging so after I finished it I decided to include it just in case if someone needs it, he/she doesn't have to reinvent the wheel.

aaraney commented 4 years ago

For sure! Thanks for taking the time to look through it, it was the perfect weekend project for me. Yeah, I am just getting into the type hinting myself, I will add it as an issue for us for the future.

Ahh I understand, so abstract at the service provider level instead. That makes a ton of sense and is a more reproducible and understandable implementation. I also love the fishing analogy.

Yeah that might make sense, it might make sense if its just for built-in methods so that they all share the same implementation. But I don't think the level of abstraction I used in the PR will be necessary, it may be overly specific.

Hahha, yeah I loved going through the logic of how you handled getting daily data. That must have been a rewarding implementation. I'd never run across the dataset myself, glad that you are giving it some love though.

cheginit commented 4 years ago

I will add it as an issue for us for the future. That would be great, thanks!

Yeah that might make sense, it might make sense if its just for built-in methods so that they all share the same implementation. But I don't think the level of abstraction I used in the PR will be necessary, it may be overly specific.

One of the challenges for making such abstractions that you mentioned is dealing with the nuances that I mentioned. For example, although standards like WMS and WFS are supposed to be standards, still ArcGIS-based services and Geoserever-based ones have some differences which makes generalizing it somewhat difficult. For example, validating user inputs relies on parsing the service properties file so if the properties are not consistent then it will be difficult to have a single abstraction that can handle these intuitively without making it overwhelming for the user. I always try to keep the high-level interface as simple as possible and if modifications are required provide useful warnings and messages to guide the user.

Right now, there are two functions in services.py for WFS and WMS. I was thinking that WCS can be added as well then all of them become one class for OGC (Open Geospatial Consortium) class. Geoserver class has similar functions for WFS and WMS which I think could use this as well.

cheginit commented 4 years ago

I am going to close this for now since many changes have been made to the code base since this PR. Please feel free to open a new PR considering our discussion. Thanks.