hackoregon / transportation-system-backend

2018 repo for the transportation api backend
MIT License
8 stars 2 forks source link

Research and define structure for Transportation Systems Project and APIs #12

Open bhgrant8 opened 6 years ago

bhgrant8 commented 6 years ago

Summary:

The Transportation Systems project will contain multiple data sets which will be exposed through our API. From both a user perspective and technical perspective what type of organization makes sense for the project?

Considerations

  1. Django projects have certain organization standards to adhere to, as well as Django Rest Framework.
  2. Modular code, separating concerns is a best practice, as it is code that is "easier to reason about"
  3. Creating an architecture which is scalable is a concern, as other cities may be involved, other unknown data sets. We may continue this project into future seasons.
  4. There may be platform technical considerations, such as load balancing based on the url path that can be beyond our control
  5. How we version the api may be involved in this decision (See #9)
  6. The Project Name and Civic Root URL is going to be defined for us (https://github.com/hackoregon/civic-devops/issues/1), any further naming should also follow these conventions

Some Possible options:

  1. Single Project Endpoint, versioned by url, all data sets available directly

ex:

https://<CIVIC ROOT URL>/transportation-systems/v1/model

Benefits:

Detractors:

  1. Single Project Endpoint, with second-level apis, split on path, versioned by project

ex:

https://<CIVIC ROOT URL>/transportation-systems/v1/api-name/model

Benefits:

Detractors:

  1. Single Project Endpoint, with second-level apis, split on path, versioned by API

ex:

https://<CIVIC ROOT URL>/transportation-systems/api-name/v1/model

Benefits:

Detractors:

znmeb commented 6 years ago

I've been thinking one Docker network with one GeoDjango container and one PostGIS container per dataset. Our datasets are so small that I'd be surprised we'd need anything like a sharded PostgreSQL server, for example. If we get to the point where we need to worry about scaling we'll probably end up having to port to a Platform-as-a-Service like OpenShift anyhow.

bhgrant8 commented 6 years ago

When I am thinking about scaling I am not so much talking about data size, to worry about sharding or any managed/platform-as-service model, we are quite far off from.

The question is more posed on organization and usability if we were to add additional endpoints to the API in the future to have to work around a bad or quick decision.

I see we could either do:

  1. single GeoDjango container hosting single app project per data set > Single PostGIS per data set

  2. Single GeoDjango container for project > with individual apps per data se t> individual PostGIS containers for each dataset (https://docs.djangoproject.com/en/2.0/topics/db/multi-db/)

for option 2):

If we do the single individual containers for each dataset (option 1),

znmeb commented 6 years ago

You can serve an arbitrary number of unrelated databases from a single PostGIS container. Can you serve an arbitrary number of REST APIs from a single GeoDjango container?

It seems like we're thinking a separate API for each main dataset (Ridership, Crash and Congestion) with a database consisting of the core dataset and any auxiliary data we might want to JOIN with it (Census, for example). Having them being two containers each helps with capacity planning; we can measure their resource usage individually.

I guess I should start a DevOps discussion on capacity planning covering the whole platform.

znmeb commented 6 years ago

Given their small size, I don't see any problem with combining the odot_crash_data and passenger_census databases into a single PostGIS container if that makes things simpler for anyone / everyone. The congestion dataset is another story.