kshitij10496 / hercules

The mighty hero helping you build projects on top of IIT Kharagpur's academic data
https://hercules-10496.herokuapp.com/api/v1/static/index.html
MIT License
34 stars 18 forks source link

Sharing DB for development #8

Closed kshitij10496 closed 5 years ago

kshitij10496 commented 5 years ago

@icyflame @amrav @DefCon-007 @athityakumar @thealphadollar

What is the best practice for providing read-access to the DB with anyone interesting in contributing to the project? aka How do I share a replica of the production DB for local development?

From the top of my mind:

icyflame commented 5 years ago

Storing a database dump in the repository.

Instead of storing the DB dump in the repository, just put it somewhere else (eg: Drive / AWS ? MediaFire) that is easily downloadable and importable.

A better idea would be to provide a minimum template for the database (if you are using SQL, you can do this by creating the basic schema required to test all the features of your application, and then exporting it as a text file). These text files are really small and can even be included in the repo.

replica of the production DB for local development?

If the production DB doesn't contain anything sensitive (user email, user phone numbers), you can share the replica. But sharing creds for a read user will simply increase the load on the production DB. Might not be a big issue for a small app, but not a good choice for something that's going to blow up. 🙂

kshitij10496 commented 5 years ago

A better idea would be to provide a minimum template for the database (if you are using SQL, you can do this by creating the basic schema required to test all the features of your application, and then exporting it as a text file). These text files are really small and can even be included in the repo.

Yes, I'm using a SQL-based DBMS, namely PostgreSQL. I've already added the table creation queries in common/db.go. Here is the schematic diagram of the tables:

img_20181018_180723744

icyflame commented 5 years ago

@kshitij10496 Oh, great! PG is really easy to share. You can create a new database, create the basic tables, and a few records in each table that will make the database easy to test.

Then, you can export/dump that table as a ".sql" file from the PG shell. That .sql file will have the table creation queries, the row insertion queries and will allow any contributor to reproduce the DB as is for testing. 🙂

thealphadollar commented 5 years ago

@kshitij10496 What I mostly do is create a separate API sort of class which gives all the basic [required] actions that the user will need to perform on the API. Here are two examples where I've done the same with sqlite3 in python.

https://github.com/thealphadollar/Nephos/blob/master/nephos/manage_db.py https://github.com/thealphadollar/OnThisDay/blob/master/src/db_handler.py

This ensures that without accessing the database in the raw manner, the user can perform all the public actions and verify the data.

Instead of storing the DB dump in the repository, just put it somewhere else (eg: Drive / AWS ? MediaFire) that is easily downloadable and importable.

@kshitij10496 For this, I think the best idea would be to store the database in drive, and update it automatically with every change in the database using Google Drive API. Plainly I could think of something like, calling a function everytime I update database (which would remove the last database using it's GDrive ID, upload the new one and store it's GDrive ID), and then update the readme with a new shareable link (only with read access, again done by GDrive API).

Another option would be to create an endpoint in the Application where the users can send a POST request with their email ID and you can use permissions part of the GDrive API to share the database with them. The benefit would be a proper listing of who all has read access the database. And since the process can be automated entirely, it's highly scalable.

For further automation, and ease of access (without authentication), you can check if the database file is already present and if it's not, you can download it using the GDrive API. This completes the whole cycle of automation, and the user will never have to care about GDrive. If database is already there, it'll be tested, otherwise downloaded, tested, uploaded and link updated.

P.S. I can help on this if you permit :grinning: