Open nelsonic opened 8 years ago
@Kumjami suggested writing records directly into ELK this would be possible: http://stackoverflow.com/questions/27388521/elasticsearch-high-indexing-throughput and potentially much cheaper... Requires further investigation
I do not think we need s3 yet. One of the key points of Federated Search is that we will call the AtCore and MC services directly so that we will have nearly real time price & availability.
A real db where we can perform real database operations (like sorting etc) might be more beneficial. (Suggested by @lennym ).
I know that the plan for storing the data in s3 is to do some analytics later on. I personally think that we do not need to care about this since we will not do this for the Prototype, the Pilot or the first steps of Production. So instead of dragging along this extra complexity I would just leave it for what it is until we have real requirements for doing it.
@lennym (reluctantly) suggested MongoDB as the place to store docs. I think we need to gather the requirements and understand what our use-case then draw up a short-list of the available options. I'd like to put _RethinkDB_ on the list to be considered simply because streaming is built-in so we could simplify the backend SDK... see: https://www.rethinkdb.com/faq/ & https://www.rethinkdb.com/docs/comparison-tables/
I'd like to put RethinkDB on the list to be considered simply because streaming is built-in
This is a hugely good point. If the persistent data store also supports being able to subscribe to writes from the websocket service then that would be a massive advantage as we'd only be using one data store/event bus.
I do not think we need s3 yet.
We're going to need some kind of data store soon to support the subqueries described at https://github.com/numo-labs/sdk/blob/master/notes/api.md#subquery-methods and while this doesn't have to be S3 (and probably shouldn't be, based on the requirements we have for it - filtering, sorting etc) it needs to exist in some form.
Yesterday we did a quick calculation of how much writing directly to S3 would cost at "Peak" load. Our calculations used the following factors:
POST
requests (to S3): $0.005 https://aws.amazon.com/s3/pricing/To get around this rediculous cost, we suggest writing the records into Redis (ElastiCache) until the slowest provider has returned it's results then batch-write all the packages in a single file to S3. This will bring the cost down to: