In this issue I note in a ad-hoc way things of relevance to selecting our database. It can be summarised as follows: I think our most suitable databases are RethinkDB or PostgreSQL (or some combination of those).
Requirements
We require a database for two main purposes:
store results from observations from iOS client (interaction events)
storing results from user evaluations
storing a cache of tweets to work around the Twitter API rate limits
What ever our selection will be there must be good support for writing our DB interfacing code in Python.
Originally I thought this was only a search engine but apparently it's a document database as well. The official website contains only ambiguous and general marketing material.
A realtime document database that pushes changes to clients. Native driver languages: Ruby, Python, Java, and JavaScript. From what I've read I've only good impressions. Documentation is good, second most starred database on GitHub, after Redis.
A RethinkDB employee says here in a StackOverflow answer form 2013 that RethinkDB is more developer oriented. Suits well when you have to work with big schemas from other companies (like in our case from Twitter). Instead of building and enforcing the schema yourself, you just dump it in. S/he goes on to say that RethinDB is not suitable when you need atomic transactions on potentially multiple tables (like transferring money), then a SQL DB is more suitable.
Probably not what we are looking for. Redis is a fast in-memory key-value store, not a document database.
Relational Databases
On first glance we think that a relational database might be the best option to store the results from user evaluations as well as interaction observations.
Originally we aimed for a document DB to store the tweet cache (JSON). However on further investigation it seems relational databases have been adding support for JSON over the years. If the support is good enough it would probably be best to use one database instead of two.
PostgreSQL
PostgreSQL has had JSON support since version 9.2. In version 9.5 support was added to modify JSON in place.
Summary: With recent addition of JSON support to PostgreSQL it has now acquired one of the main benefit of NoSQL databases, get started fast. He takes an example that the appeal of NoSQL is that you don't have to worry about the schema'a right away. To get going fast, just add a column or a table with JSONB (JSON Binary representation) and store the information you need. As time goes on and you need to access the data you slowly normalise the data into tables over time.
Summary: PostgreSQL is nice if we are not changing the JSON a lot. If we are, a document database would be more suitable. According to this blog you can't update JSON fields in place, you have to have some external (Python) code to take it out, add it, then dump it all in again.
In this issue I note in a ad-hoc way things of relevance to selecting our database. It can be summarised as follows: I think our most suitable databases are RethinkDB or PostgreSQL (or some combination of those).
Requirements
We require a database for two main purposes:
What ever our selection will be there must be good support for writing our DB interfacing code in Python.
Potential document DB candidates:
Potential relational DB candidates:
Notes
This issue tracks notes on what data stores might suit us best.
http://www.stackoverkill.com/ranking/sql-nosql
By mentions on Stack Overflow MongoDB dwarfs the other NoSQL DBs. Next up comes Redis and Cassandra.
List of document databases on Wikipedia: https://en.wikipedia.org/wiki/Document-oriented_database#Implementations
Popular NoSQL databases by GitHub stars
and more
Compose https://www.compose.io/
Public cloud service to host various databases:
Document Databases
CrateIO
Didn't dive deep into it. A share-nothing document database that seems to be designed for containerised environments (Docker).
https://en.wikipedia.org/wiki/CrateIO https://crate.io/
ElasticSearch
Originally I thought this was only a search engine but apparently it's a document database as well. The official website contains only ambiguous and general marketing material.
This video gave me a pretty good quick start picture of what ElasticSearch is and how you do basic queries: https://www.youtube.com/watch?v=60UsHHsKyN4
RethinkDB
A realtime document database that pushes changes to clients. Native driver languages: Ruby, Python, Java, and JavaScript. From what I've read I've only good impressions. Documentation is good, second most starred database on GitHub, after Redis.
Our take on RethinkDB vs today's NoSQL. Hacker News link. Blog from Rethink. It's from 2013 so it may not hold up today (2016). Summarises that first generation NoSQL databases fall in two categories:
The propose that Rethink draws from experience and lessons learned form the first forerunners in NoSQL databases.
MongoDB
Redis
Probably not what we are looking for. Redis is a fast in-memory key-value store, not a document database.
Relational Databases
On first glance we think that a relational database might be the best option to store the results from user evaluations as well as interaction observations.
Originally we aimed for a document DB to store the tweet cache (JSON). However on further investigation it seems relational databases have been adding support for JSON over the years. If the support is good enough it would probably be best to use one database instead of two.
PostgreSQL
PostgreSQL has had JSON support since version 9.2. In version 9.5 support was added to modify JSON in place.
JSON data type: https://www.postgresql.org/docs/9.5/static/datatype-json.html JSON functions: https://www.postgresql.org/docs/9.5/static/functions-json.html
Resources
Rob Conery - Document Storage Techniques with PostgreSQL and JSONB
Link: https://www.youtube.com/watch?v=rg_GiOZ5Owk
Summary: With recent addition of JSON support to PostgreSQL it has now acquired one of the main benefit of NoSQL databases, get started fast. He takes an example that the appeal of NoSQL is that you don't have to worry about the schema'a right away. To get going fast, just add a column or a table with JSONB (JSON Binary representation) and store the information you need. As time goes on and you need to access the data you slowly normalise the data into tables over time.
Is PostgreSQL Your Next JSON Database?
Link: https://www.compose.io/articles/is-postgresql-your-next-json-database/
Summary: PostgreSQL is nice if we are not changing the JSON a lot. If we are, a document database would be more suitable. According to this blog you can't update JSON fields in place, you have to have some external (Python) code to take it out, add it, then dump it all in again.
MySQL
Todo.
References
Document databases, Wikipedia: https://en.wikipedia.org/wiki/Document-oriented_database