Tables are created and data is copied over from n3c. However, no additional constraints, such as PKeys, are set.
Sub-tasks
(3) and (4) are currently lower priority because I'm not sure that any tests rely on these, and especially (4) because it would be time-consuming to do. I think no tests rely on this setup because mostly I'm just using test_n3c for inserts, and other tests are executing on the actual n3c schema.
[ ] 1. Set up test_n3c schema for PKs
This will probably include some kind of config. Maybe it'd be optimal to tie that config to initialize() or elsewhere if needed.
[ ] Tests: Reactivate IntegrityError tests which check for inserting duplicate records
Probably not indexes, since these tables will be quite small (~50 rows).
[ ] 4. Correct relational data between tables
4. Correct relational data between tables
Basically, we're setting up these tables by copying the first 50 rows from each table. However, this is not correct from a "relational data" perspective.
What's meant by "relational data" is like so: A table like code_sets is primary. Perhaps this is the most / only primary table. For every code_set in that table, we only need entries in concept_set_container, concept_set_version_item, and concept_set_members that apply to these code sets. Further, we can then filter the concept table to include only those which are listed in that member table. Then we can filter concept_relationship and concept_ancestor but what's there. Then, once these core tables are set up, any derived tables can be updated by running refresh_derived_tables().
Perhaps the best way to achieve this is by updating initialize() so that the "setup test schema" part of it does its own initialization, basically subsetting the code_set dataset first, and then filtering the other datasets like that, and then uploading. But this will also likely be slower than just doing something similar using the already existing SQL tables.
Also, have to consider how slow it is to do this. Right now I'm running remakes of the test schema at the start of every test suite. If it is too slow, we could consider adding some sort of caching. But we'd have to commit those cached files too, otherwise the GitHub action tests would also run quite slowly.
Overview
Tables are created and data is copied over from
n3c
. However, no additional constraints, such as PKeys, are set.Sub-tasks
(3) and (4) are currently lower priority because I'm not sure that any tests rely on these, and especially (4) because it would be time-consuming to do. I think no tests rely on this setup because mostly I'm just using
test_n3c
for inserts, and other tests are executing on the actualn3c
schema.test_n3c
schema for PKsinitialize()
or elsewhere if needed.IntegrityError
tests which check for inserting duplicate recordstest_n3c
worth setting up? Constraints (e.g.NOT-NULL
)?4. Correct relational data between tables
Basically, we're setting up these tables by copying the first 50 rows from each table. However, this is not correct from a "relational data" perspective.
What's meant by "relational data" is like so: A table like
code_sets
is primary. Perhaps this is the most / only primary table. For everycode_set
in that table, we only need entries inconcept_set_container
,concept_set_version_item
, andconcept_set_members
that apply to these code sets. Further, we can then filter theconcept
table to include only those which are listed in that member table. Then we can filterconcept_relationship
andconcept_ancestor
but what's there. Then, once these core tables are set up, any derived tables can be updated by runningrefresh_derived_tables()
.Perhaps the best way to achieve this is by updating
initialize()
so that the "setup test schema" part of it does its own initialization, basically subsetting thecode_set
dataset first, and then filtering the other datasets like that, and then uploading. But this will also likely be slower than just doing something similar using the already existing SQL tables.Also, have to consider how slow it is to do this. Right now I'm running remakes of the test schema at the start of every test suite. If it is too slow, we could consider adding some sort of caching. But we'd have to commit those cached files too, otherwise the GitHub action tests would also run quite slowly.