humlab-sead / sead_query_api

.NET Core port of Query SEAD API
Other
2 stars 0 forks source link

Quality Check: Examine facet graph relations #74

Closed roger-mahler closed 4 years ago

roger-mahler commented 4 years ago

Find all discrepancies between the facet graph and the public SEAD database foreign key relations. Purpose is to identify:

The check assume that all relations are bi-directional.

roger-mahler commented 4 years ago

SQL query:


with public_foreign_keys as (
    select  table_name          as source_name,
            foreign_table_name  as target_name,
            foreign_column_name as column_name
    from sead_utility.foreign_key_columns
    where table_schema = 'public'
), public_relations as (
    select source_name, target_name, column_name
    from public_foreign_keys
    union all
    select target_name, source_name, column_name
    from public_foreign_keys
), facet_undirected_relations as (
    select  t1.table_or_udf_name    as source_name,
            t2.table_or_udf_name    as target_name, 
            r.target_column_name    as column_name
    from facet.table_relation r
    join facet.table t1 on t1.table_id = r.source_table_id
    join facet.table t2 on t2.table_id = r.target_table_id
    where r.source_column_name = r.target_column_name
), facet_relations as (
    select source_name, target_name, column_name
    from facet_undirected_relations
    union all
    select target_name, source_name, column_name
    from facet_undirected_relations
), consolidated_relations as (
    select coalesce(p.source_name, f.source_name) as source_name,
           coalesce(p.target_name, f.target_name) as target_name,
           coalesce(p.column_name, f.column_name) as column_name,
           case when p.source_name is null then '' else 'FK' end as is_public,
           case when f.source_name is null then '' else 'EDGE' end as is_facet
    from public_relations p
    full outer join facet_relations f
      on f.source_name = p.source_name
     and f.target_name = p.target_name
) select distinct
         '(''' || case when source_name <= target_name then source_name else target_name end || ''', ''' ||
         '''' || case when source_name  > target_name then source_name else target_name end || ''', ''' ||
         '''' || column_name || '''),  -- ' || is_public || ' ' || is_facet
  from consolidated_relations
  where is_public || is_facet <> 'FKEDGE'
    and source_name not like 'facet.%'
    and target_name not like 'facet.%'
  order by 1
roger-mahler commented 4 years ago

Missing relations (for sure):

relation is_public is_facet
analysis_entities <=> isotopes (analysis_entity_id) YES
ceramics <=> ceramics_lookup (ceramics_lookup_id) YES
ceramics_lookup <=> methods (method_id) YES
isotope_measurements <=> isotopes (isotope_measurement_id) YES
isotope_measurements <=> methods (method_id) YES
dendro <=> dendro_lookup (dendro_lookup_id) YES
dendro_lookup <=> methods (method_id) YES
roger-mahler commented 4 years ago

These can most likely be removed (see humlab-sead/sead_change_control#70)

relation is_public is_facet
ceramics <=> ceramics_measurements (ceramics_measurement_id) YES
dendro <=> dendro_measurements (dendro_measurement_id) YES
dendro_dates <=> years_types (years_type_id) YES
roger-mahler commented 4 years ago

These probably need to be added to afford new facets for dendrochronology and isotope data. New facet specifications are currently missing, though.

relation is_public is_facet
dendro_dates <=> dendro_lookup (dendro_lookup_id) YES
dendro_dates <=> error_uncertainties (error_uncertainty_id) YES
isotope_measurements <=> isotope_standards (isotope_standard_id) YES
isotope_measurements <=> isotope_types (isotope_type_id) YES
isotope_standards <=> isotopes (isotope_standard_id) YES
isotope_value_specifiers <=> isotopes (isotope_value_specifier_id) YES
isotopes <=> units (unit_id) YES
roger-mahler commented 4 years ago

These can be ignored:

relation is_public is_facet
dataset_methods <=> methods (method_id) YES EMPTY
dataset_methods <=> datasets (dataset_id) YES EMPTY
datasets <=> datasets (dataset_id) YES
age_types <=> dendro_dates (age_type_id) YES SINGLE ITEM