jamesbutler01 / Mymou

Mymou: A standalone touchscreen in-cage testing device
16 stars 11 forks source link

Database for data storage #13

Open jamesbutler01 opened 5 years ago

jamesbutler01 commented 5 years ago

Need to develop a database for storing all information created by Mymou, this includes:

Ideally wants to be in the cloud, so that multiple Mymou's can be aggregated. Not sure if there is free service for that? Super ambitious: Web interface to access the database and bring up task performance remotely

Implementing SQLite and storing the database locally seems straight forwards on Android, and that's where I'll start, as should hopefully be straightforward to then transfer the database to the cloud etc in due course

iandol commented 5 years ago

Here are my saved notes on a more general purpose database backend:

General Design

Java/Kotlin supports the following microservices frameworks:

  1. Micronaut
  2. KTor

This means for each service we can use a different database engine, or the same engine but different stores. This is the main idea behind microservices. KTor is more lightweight and I think will be easier to program for basic requirements...

Each microservice should provide the interface between the database and the client. It should use a REST interface with HTTP for each service.

Here is an example of an existing Neuroscience database backend from UCL's Cortex-Lab: https://github.com/cortex-lab/alyx — postgresql / apache2 and python's django providing a REST API.

Database backends

The database themselves could be either SQL or NO-SQL. Because the data stored will be minimal it doesn't much matter which I think, but if we want to extend for storing eye tracking or wireless recording or accelerometer data, this becomes an issue perhaps.

SQL is more traditional, but requires designing a schema at the beginning and it is more work when modifying a database (database migration). NOSQL is very flexible much easier to set up, using JSON and can quickly revise the data stored, but for very complex retrieval queries they sometimes don't scale as well as SQL (but do we really need very complex retrieval).

iandol commented 5 years ago

Realm seems ideal for cloud sync and ease of use and quick implementation. I've emailed them asking if there are any academic exemptions for self-hosting their platform...

iandol commented 5 years ago

MongoDB's mobile version has a Java interface: https://www.mongodb.com/products/mobile — they make an easy-to-use cloud sync system (currently in beta), and their cloud is free up to 512MB of data.

That seems like another nice option, start off with a local MongoDB on device, then you can migrate to a MongoDB server you host yourself in the lab, then you could move it over to a full hosted cloud server.

One issue with cloud is potential worry of security for sensitive data, but I think full authentication and full encryption are the default for all these systems. The databases are hosted on the big providers like Amazon / Google / Microsoft.

MongoDB just bought Realm btw.

iandol commented 5 years ago

Hi James, this option developed by Harris / Carandini lab at UCL is looking more and more appealing to me...

https://github.com/cortex-lab/alyx

PostgreSQL and Python, with a Rest API which means you can connect using any language, and can be local hosted or cloud hosted easily. May need to have the schema tweaked as it is currently used for mice. It would be good to use an existing system that is focussed on neuroscience data... I am going to try and set this up and see how the API works...

jamesbutler01 commented 5 years ago

Hey Max! As is probably obvious I haven't made any progress yet on Mymou's db, this Alyx looks promising. As they're local I'll try meet with them to chat about the feasibility of using it with Mymou (if you have any questions let me know).

I definitely want to create an SQL service as I've been meaning to learn that skill for myself and it seems to be by far the most common in industry. I'm struggling to imagine what features Alyx may have that would be desirable and not available in the commercial SQL providers?

iandol commented 5 years ago

The problems with databases are twofold:

  1. Creating the schema, i.e. structuring your data and building a mapping between your data and the database tables (this is where things like ORMs come in). It is pretty tedious stuff, and if you are intent on doing everything yourself requires a lot of repetitive code.
  2. Accessing the database --- you can either force everyone to also use SQL and unroll the database back to data structures in their preferred language, or build a REST API. The REST API is much easier and more flexible (why they are universal in almost all public database systems and data services) for other people to use (any tool that can make a HTTP request can access a database)...

Alyx has done both of these parts, and that simply means getting a well designed and flexible database will be a lot less work. Alyx uses a PostgreSQL backend, and you could think about forking the project and tweaking that to learn the ropes (perhaps contributing schema changes to PostgreSQL database structures and changes to the REST interface for things specific to monkeys), rather than doing everything from scratch?

For me the major downside of Alyx (and SQL in general) is that it requires a schema (a rigid structure) that is harder to adapt as new data fields become necessary. That is why databases like MongoDB are popular. Alyx cheats by using an unstructured field:

https://alyx.readthedocs.io/en/latest/considerations.html#metadata-vs-bulk-data

This seems a fair solution, as long as the searchable metadata enables enough discovery (it says the json field is searchable at least).

Currently there isn't much data that Mymou needs to store. Your first suggesting is to just use SQLite, which is OK for small on-device storage, but doesn't scale so well to the local network or the cloud. It also isn't so easy if you want to use the data on a desktop, you'll need to make a custom database search and retrieval tool (i.e. move the sqlite files to a desktop, then make something like a rest API for them). Alyx would need to start on the local network (can't run on-device), but would easily scale to the cloud, and has webforms and accessibility built-in.

jamesbutler01 commented 5 years ago

Regarding 2, I don't see much issue with requiring users to access data through SQL. I don't think that's much different than having to write import scripts for CSV files or whatever other format data is normally stored in.

I definitely don't want to force users onto the network, so that probably means Alyx isn't suitable unfortunately. For instance here at UCL we are not allowed to connect to the internet with the tablets due to the photos that are produced by Mymou. I'm currently two types of data storage, a general one that stores properties common across all tasks/users, such as trials worked, performance, reward given etc, and then a more specific one for each task that users can write any parameters they want to. For the former I think SQL should suit quite well, which Mymou can then use to deliver performance data in the app without the need for any analysis/processing of data by the user. Then the (optional) latter data can be written to disk for users to extract and analyse as they wish.

Today I'm finally starting as I've run out of other features to implement, so lets see how I get on anyways..

iandol commented 5 years ago

Hi James,

The REST API is much nicer and more flexible to work with, but yes, SQL itself is not too difficult to learn.

But if the database is local-only, I don't really see the point of SQLite. Why not just use JSON/CSV text files, they will be even easier than binary database files. Another advantage is they can be then inserted into a later database much more easily (i.e. something like Alyx has a JSON field). If you encode everything into SQlite files, then if you do want to later develop a shared database, you will need to write code to export from SQlite into the new format. Why not just use JSON in the first place?

Does UCL not even allow a closed network (i.e. a network switch not connected to the outside internet)?

Anyway, just my 0.2💰(●'◡'●)