hackforla / peopledepot

A project to setup a datastore for people and projects at HackforLA. The link below takes you to the code documentation
https://hackforla.github.io/peopledepot/
GNU General Public License v2.0
7 stars 26 forks source link

Document how to create initial data for tables #36

Closed fyliu closed 10 months ago

fyliu commented 2 years ago

Overview

Since some of our database tables need to be pre-populated with data, we need to decide on a way to do it and document it.

Action Items

Resources/Instructions

In any solution, we need to be able to connect it to django's migration system or in a script so it can be auto-inserted on clean build.

  1. Data-creation code directly inside the migration file. OR just use as standalone script.
    1. From spreadsheet
      1. Use App Script to generate json file
      2. Use a script to generate the python code for adding the data
    2. From existing table data
      1. Use dumpscript to generate the python code for adding the data
  2. Load SQL script from django migration file Place the sql script in <app>/sql/<app>.sql
    1. From spreadsheet
      1. Use App Script to generate json file
      2. Use a script to generate the SQL commands from json
    2. From existing table data
      1. Use DBeaver to generate insert statements
  3. Load fixture from django migration file Place the migration file in <app>/fixtures/<app>.json
    1. From spreadsheet
      1. Use App Script to generate json file
      2. Use a script to rewrite json in django migration format
    2. From existing table data
      1. Use dumpdata to generate json from data already in the db
fyliu commented 2 years ago

I've looked into options 1 and 3ii. They are both viable options and I think the fixture option is the better one since we can also use it for testing, although with the django_db decorator will automatically apply all the migrations, so all the options will work in that case. We don't have any non-django_db tests but we might eventually. So the fixture option is still preferable if other factors are equal.

In a call with @ExperimentsInHonesty, I learned about the possibility of generating json from G sheets using an app script. I later tried option 3i. It also works well. The advantage of it is that it may be a more direct path from the initial spreadsheet to the final migration file. The options that dumps data from existing tables have the extra step of first inserting the data into the system through the admin site. That's a potentially step to introduce user errors.

To compare the work involved between from spreadsheet and from table data

Now that it's laid out, starting from sheets data has more opportunities for automation, so it feels more future-proof.

The next step will be to evaluate if the option 3i route I've done will be adequate for our needs.

fyliu commented 2 years ago

Another possibility

This one imports the data from a CSV file. The code is custom to each model but it can likely be made more generic.

fyliu commented 2 years ago

To answer the null issue from 2 comments ago, loading from fixture is meant to not activate the auto timestamp behavior defined in the model. The load from fixture functionality is meant to be for dumping and loading data that was already in the database at one point and not for loading initial data. See this django ticket for details about why they won't change this behavior.

So we now have to choose:

  1. Generate the fixtures from spreadsheet and allow the timestamps fields to be null. It may not be a disadvantage to not have it for the pre-populated data.
  2. Manually insert the data into django and use dumpdata like it's intended to be used. This will write the timestamps from the db.
  3. Generate python code from spreadsheet so that it'll go through the normal route to adding data to the db and have the timestamps.
fyliu commented 2 years ago

This is documented here and linked from the wiki.

The strategy is to export the JSON from the spreadsheet, convert it into a python script that can insert the data, then create a migration file that will call the script when it's processed.

The advantages of using a python script as opposed to importing the data in JSON form using loaddata:

fyliu commented 2 years ago

I finished this and then realized there's a better way to do this. The initial data shouldn't be made into migrations if this project is supposed to be generic. They should be left as individual scripts to be run on initial database setup by us, and customized by other organizations for their needs.

fyliu commented 1 year ago

I have a draft PR #141 for this. It's a draft because it's dependent on #140 to be merged.

fyliu commented 1 year ago

I modified the PR to not depend on sphinx so we can move this forward.

fyliu commented 1 year ago

There's no model created that has a need for initial data. This makes it more difficult to review the database insertion part of #141. I will go implement #24.

fyliu commented 1 year ago

Looks like #24 has the same structure as #35 and we don't have a direction for working on #35 yet. So I can't work on #24.

The other thing I can do is pull the SOC_Major table issue out of the ice box, but I would rather not do that since it's going to be prioritized in the far future.

Blocker: I need to implement a table with initial data, and there's no good one to do. This is blocking on #35.

fyliu commented 10 months ago

This issue was implemented by PR #226