openreferral / specification

The Human Services Data Specification - a data exchange format developed by the Open Referral Initiative
https://openreferral.org
Other
117 stars 49 forks source link

Schema dump for v.2.0.1? #235

Closed pbonnell closed 7 months ago

pbonnell commented 3 years ago

The current slq core schema dump here is v1.2. Does anyone have a clean v2.0.1 they could dump drop in there?

We're working on a direct implementation of the HSDS Transformer in an existing Rails app and looking at saving directly to a Postgres store for our adoption workflow. A canonical schema from this spec would speed things along nicely!

greggish commented 3 years ago

I believe this is queued up over in /api-specification in #109. @kinlane has it on his roadmap; Kin, any updates?

@pbonnell there are also others working on upgrading the transformer, we might want y'all to be in touch - cc @devinbalkind @adityasrini

devinbalkind commented 3 years ago

We updated the Transformer to 2.0.1 last month - https://github.com/sarapis/hsds-transformer

It should output an HSDS zip file that follows the reference schema.

If you have any issues with it please post them directly to that GitHub repo. No one has used it in the wild yet so would love feedback and issues etc.

I also think, and have argued for a while, that a canonical sql schema is essential for moving this project forward. I’ll be happy to do whatever’s necessary to get it done.

On Sun, May 2, 2021 at 15:30 Greg Bloom @.***> wrote:

I believe this is queued up over in /api-specification https://github.com/openreferral/api-specification in #109 https://github.com/openreferral/api-specification/issues/109. @kinlane https://github.com/kinlane has it on his roadmap; Kin, any updates?

@pbonnell https://github.com/pbonnell there are also others working on upgrading the transformer, we might want y'all to be in touch - cc @devinbalkind https://github.com/devinbalkind @adityasrini https://github.com/adityasrini

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/openreferral/specification/issues/235#issuecomment-830859436, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADMPF6FN4MBEU6BZUP4O5LTLWR4RANCNFSM434QKX2Q .

-- Devin Balkind @devinbalkind devinbalkind.com

robredpath commented 3 years ago

This should be fairly straightforward to generate at build time from the schema, because HSDS is fundamentally tabular.

The build system is Sphinx, which means we can run arbitrary Python during the build. https://pypi.org/project/jsontableschema/ has an example of using sqlite to create a database from table schema, which we might then be able to export.

kinlane commented 3 years ago

Here is a SQL dump for v2.0.0, I will make a pass for 2.0.1 changes - https://gist.github.com/kinlane/99dbd718a7f50a85c00eb4dd44ca39f1

odscjames commented 2 years ago

I tried https://framework.frictionlessdata.io/docs/tutorials/formats/sql-tutorial/ and this looks good (We already use frictionless lib for testing the example data.)

First hurdle - it expects a full data package and we just have the datapackage.json file - but a quick bit of Python makes a full data package with no data in it:


import os
import json
import csv
import shutil

os.makedirs("blank_data_package", exist_ok=True)

shutil.copyfile("datapackage.json", os.path.join("blank_data_package", "datapackage.json"))

with open("datapackage.json") as fp:
    datapackage = json.load(fp)

    for table in datapackage.get('resources'):

        with open(os.path.join("blank_data_package", table['path']), 'w', newline='') as fp:
            writer = csv.writer(fp)
            writer.writerows([[i['name'] for i in table['schema']['fields']]])

Lets try Psql first (just cos I know it more):

pip install psycopg2 frictionless[sql]

Then a quick Python build file:

from frictionless import Package
package = Package('blank_data_package/datapackage.json')
package.to_sql('postgresql://postgres:PASSWORD@localhost:PORT/postgres')

Personally I ran Postgres via docker - https://hub.docker.com/_/postgres - but whatever you have handy.

Seems good! Get a set of tables with foreign keys and different column types.

It seems to use sqlalchemy so I have high hopes this will work with Mysql and others too.

odscjames commented 2 years ago

I ran official mysql via https://hub.docker.com/_/mysql (we should double check variants maybe?)

pip install mysqlclient==2.0.3

See https://pypi.org/project/mysqlclient/ as you may need some other libs first

Log in and create a database.

Change build line to:

package.to_sql('mysql://root:PASSWORD@127.0.0.1:PORT/DATABASE')

Get crash:

sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1170, "BLOB/TEXT column 'id' used in key specification without a key length")

Ok, fair point.

If I edit datapackage.json, and on the ID columns AND foreign key columns add a max length:


{
                            "name": "id",
                            "type": "string",
                            "description":"Each organization must have a unique identifier.",
                            "format":"uuid",
                            "constraints": {
                                "required":true,
                                "unique":true,
                                "maxLength": 255
                            }
                        },

It starts to work. I get tables created.

I notice the existing database/hsda_core.sql uses varchar(X) for id columns, with various lengths - 38 for organization for instance.

So I think it's acceptable to limit this - we just have to pick a number.

(This is maybe worth doing for postgres too? I just went back and looked and the postgres tables have TEXT as a primary ID which maybe isn't great.)

odscjames commented 2 years ago

Ok - here's a test repo for Postgres.

It has shell scripts that - with only docker installed - will build a database, dump it to a file and clean up your docker resources afterwards.

It has a github action that will build the schema and error if there is a diff - so we catch if someone changes the schema and does not follow the instructions to rebuild.

https://github.com/odscjames/test-gh-actions-database-dump/tree/postgres

Working action: https://github.com/odscjames/test-gh-actions-database-dump/actions/runs/1312258007

Broken action: https://github.com/odscjames/test-gh-actions-database-dump/actions/runs/1312266278

[ We just have to replace the CREATE TABLE stuff with the python stuff from previous comments to actually make the schema we want ]

I think we now have all the pieces to put this together for postgres ...

odscjames commented 2 years ago

Done for mysql!

https://github.com/odscjames/test-gh-actions-database-dump/tree/mysql

Working: https://github.com/odscjames/test-gh-actions-database-dump/actions/runs/1312442859

Broken: https://github.com/odscjames/test-gh-actions-database-dump/actions/runs/1312451636

odscjames commented 2 years ago

And I'm giving up, for now at least, on MS SQL Server. https://github.com/odscjames/test-gh-actions-database-dump/tree/mssqlserver

The problem is the official MS tool to actually dump schema (as opposed to just data) is a poorly maintained Python script. I got so far then hit a problem where all the workarounds were basically "run this on a system from 17 years ago" (maybe a touch exaggerated) but this is clearly going to make a very brittle and temperamental solution. Lets do the others then consider if this is worth it later.

(And other minor things, like that in any case to get this far I had it so every time you run it locally it has to download 100MB of code again and the official CLI tool complains with an error every time even tho the command actually works!)

odscjames commented 2 years ago

PR for MySQL: https://github.com/openreferral/specification/pull/261

odscjames commented 2 years ago

On Master branch there is now a Mysql schema for version 2, updated automatically.

I've made a new issue for adding content to the docs about this.

Dokotela commented 2 years ago

Hey everyone, I wanted to reach out because I've recently been in touch with Greg Bloom and he pointed me in this direction. I'm a Dart/Flutter developer but I'm also a Co-Chair for the HL7 Health and Human Services working group. We're going to be working on trying to map FHIR to HSDS so I've been playing around with the spec, and my first step was just to try and create a postgresql script to create the tables. I wanted to post it here to see if I'm on the right track and if anyone sees any issues with what I've got so far. https://github.com/MayJuun/hsds/blob/main/sql/hsds.sql

mrshll1001 commented 7 months ago

I am closing this as there are now some shell scripts run as part of the build process for generating mysql and postgresql database schemas.

@Dokotela hopefully these help with your use-case! Apologies on behalf of the community for not having a response to your comment earlier.