zooniverse / theia

Building the next-generation Floating Forests pipeline
3 stars 2 forks source link

Technology selection #2

Closed amy-langley closed 5 years ago

amy-langley commented 5 years ago

Since we're going to tentatively build this pipeline out in Python, we need to select Python versions of our familiar tools:

  1. ORM (Rails)
  2. Job queue (Sidekiq)
  3. REST (Faraday)
  4. Oauth
  5. Unit tests (rspec)

As well as tools for handling some novel challenges:

  1. Image processing
  2. GIS processing
adammcmaster commented 5 years ago

The ones I've used/would suggest are:

camallen commented 5 years ago

Look into faktory for a queue system as well, https://www.mikeperham.com/2019/01/08/using-faktory-with-python/

amy-langley commented 5 years ago

Pillow was my solution of choice as well, but it looks like they can't support 16-bit/channel TIFF files, which is the format of LANDSAT data channels, so I may need to either find something else or augment Pillow with our existing method of using ImageMagick

amy-langley commented 5 years ago

That said: https://stackoverflow.com/questions/50761021/how-to-open-a-tif-cmyk-16-bit-image-file

amy-langley commented 5 years ago

I'm no longer delighted about Faktory, knowing that it bakes its own persistence mechanism into the server. I'd much rather stick with Celery/Redis for future flexibility. Thoughts?

amy-langley commented 5 years ago

Faktory is moving from RocksDB to an embedded Redis instance:

https://github.com/contribsys/faktory/wiki/Redis

but note that they are explicitly not enabling shared tenancy or replication

amy-langley commented 5 years ago

Last note, we require several features that are in Faktory's paid tier but not their free tier, so I think celery is probably going to be our best bet.

amy-langley commented 5 years ago

Current plan is:

ORM : Django Queue: Celery Images: Pillow/ImageMagick where necessary GIS: libgdal + python bindings REST/Oauth: requests / request-oauthlib Testing: pytest + pspec