Research and define structure for Transportation Systems Project and APIs

bhgrant8 commented 6 years ago

Summary:

The Transportation Systems project will contain multiple data sets which will be exposed through our API. From both a user perspective and technical perspective what type of organization makes sense for the project?

Considerations

Django projects have certain organization standards to adhere to, as well as Django Rest Framework.
Modular code, separating concerns is a best practice, as it is code that is "easier to reason about"
Creating an architecture which is scalable is a concern, as other cities may be involved, other unknown data sets. We may continue this project into future seasons.
There may be platform technical considerations, such as load balancing based on the url path that can be beyond our control
How we version the api may be involved in this decision (See #9)
The Project Name and Civic Root URL is going to be defined for us (https://github.com/hackoregon/civic-devops/issues/1), any further naming should also follow these conventions

Some Possible options:

Single Project Endpoint, versioned by url, all data sets available directly

ex:

https://<CIVIC ROOT URL>/transportation-systems/v1/model

Benefits:

Default setup for Django/DRF - easier setup
- Versioning on project level will create less individual api versions in the wild

Detractors:

Monolithic
Versioning on project level will create desire to make overarching releases

Single Project Endpoint, with second-level apis, split on path, versioned by project

ex:

https://<CIVIC ROOT URL>/transportation-systems/v1/api-name/model

Benefits:

Modularized code, easier to reason about
Versioning on project level will create less individual api versions in the wild

Detractors:

Versioning on project level will create desire to make overarching releases
Moving from default project setup means more time developing/debugging custom code

Single Project Endpoint, with second-level apis, split on path, versioned by API

ex:

https://<CIVIC ROOT URL>/transportation-systems/api-name/v1/model

Benefits:

Modularized code, easier to reason about
Allows for releases of the individual apis vs the whole project

Detractors:

More individual apis in the wild
increased complexity for the user to know which version to use
increased code complexity

znmeb commented 6 years ago

I've been thinking one Docker network with one GeoDjango container and one PostGIS container per dataset. Our datasets are so small that I'd be surprised we'd need anything like a sharded PostgreSQL server, for example. If we get to the point where we need to worry about scaling we'll probably end up having to port to a Platform-as-a-Service like OpenShift anyhow.

bhgrant8 commented 6 years ago

When I am thinking about scaling I am not so much talking about data size, to worry about sharding or any managed/platform-as-service model, we are quite far off from.

The question is more posed on organization and usability if we were to add additional endpoints to the API in the future to have to work around a bad or quick decision.

I see we could either do:

single GeoDjango container hosting single app project per data set > Single PostGIS per data set
Single GeoDjango container for project > with individual apps per data se t> individual PostGIS containers for each dataset (https://docs.djangoproject.com/en/2.0/topics/db/multi-db/)

for option 2):

we would be having to manage more environment variables for databases, other django config so some greater complexity,
on the other hand, only a single container is being deployed on the api side,
and django will handle the url pathing past the root
benefits might be easier deploy, easier to manager versioning through single project

If we do the single individual containers for each dataset (option 1),

would we expect greater or less complexity in the environment?
what benefits are we gaining?
what challenges might we anticipate?

znmeb commented 6 years ago

You can serve an arbitrary number of unrelated databases from a single PostGIS container. Can you serve an arbitrary number of REST APIs from a single GeoDjango container?

It seems like we're thinking a separate API for each main dataset (Ridership, Crash and Congestion) with a database consisting of the core dataset and any auxiliary data we might want to JOIN with it (Census, for example). Having them being two containers each helps with capacity planning; we can measure their resource usage individually.

I guess I should start a DevOps discussion on capacity planning covering the whole platform.

znmeb commented 6 years ago

Given their small size, I don't see any problem with combining the odot_crash_data and passenger_census databases into a single PostGIS container if that makes things simpler for anyone / everyone. The congestion dataset is another story.

hackoregon / transportation-system-backend

Research and define structure for Transportation Systems Project and APIs #12