snowplow / igluctl

A command-line tool for working with Iglu schema registries
8 stars 6 forks source link

igluctl can create duplicated entries inside jsonpaths and sql DDL's #93

Open rkorszun opened 2 years ago

rkorszun commented 2 years ago

Problem description

For schemas containing properties with additionalProperties like this:

{
   (...)
   "properties": {
   (...)
   "filters": {
            "type": "object",
            "additionalProperties": {
                "type": ["string", "boolean"]
            }
        },
 (...)
}

the jsoonpath will contain duplicated entries:

{
    "jsonpaths": [
        (...)
        "$.data.filters",
        "$.data.filters",
        (...)
    ]
}

the same for generated SQL:

-- AUTO-GENERATED BY igluctl DO NOT EDIT
-- Generator: igluctl 0.8.0
-- Generated: 2021-12-21 12:14 UTC

CREATE SCHEMA IF NOT EXISTS atomic;

CREATE TABLE IF NOT EXISTS atomic.com_snowplowanalytics_snowplow_site_search_1 (
    (..)
    "filters"        VARCHAR(4096)  ENCODE ZSTD,
    "filters"        VARCHAR(4096)  ENCODE ZSTD,
    (...)
)
(...)

I expect that entries will be not duplicated as it is for example in iglu-central repository.

Steps to reproduce:

  1. clone the https://github.com/snowplow/iglu-central repository
  2. execute the igluctl static generate for https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow/site_search/jsonschema/1-0-0 schema
  3. verify the output path for 'jsonpaths/com.snowplowanalytics.snowplow/site_search_1.json'
    `root@4a755f2db42b:/out/jsonpaths# cat com.snowplowanalytics.snowplow/site_search_1.json
    {
    "jsonpaths": [
        "$.schema.vendor",
        "$.schema.name",
        "$.schema.format",
        "$.schema.version",
        "$.hierarchy.rootId",
        "$.hierarchy.rootTstamp",
        "$.hierarchy.refRoot",
        "$.hierarchy.refTree",
        "$.hierarchy.refParent",
        "$.data.terms",
        "$.data.filters",
        "$.data.filters",
        "$.data.pageResults",
        "$.data.totalResults"
    ]
    }
  4. verify generated SQL
    
    root@4a755f2db42b:/out/sql# cat com.snowplowanalytics.snowplow/site_search_1.sql 
    -- AUTO-GENERATED BY igluctl DO NOT EDIT
    -- Generator: igluctl 0.8.0
    -- Generated: 2021-12-21 12:14 UTC

CREATE SCHEMA IF NOT EXISTS atomic;

CREATE TABLE IF NOT EXISTS atomic.com_snowplowanalytics_snowplow_site_search_1 ( "schema_vendor" VARCHAR(128) ENCODE ZSTD NOT NULL, "schema_name" VARCHAR(128) ENCODE ZSTD NOT NULL, "schema_format" VARCHAR(128) ENCODE ZSTD NOT NULL, "schema_version" VARCHAR(128) ENCODE ZSTD NOT NULL, "root_id" CHAR(36) ENCODE RAW NOT NULL, "root_tstamp" TIMESTAMP ENCODE ZSTD NOT NULL, "ref_root" VARCHAR(255) ENCODE ZSTD NOT NULL, "ref_tree" VARCHAR(1500) ENCODE ZSTD NOT NULL, "ref_parent" VARCHAR(255) ENCODE ZSTD NOT NULL, "terms" VARCHAR(65535) ENCODE ZSTD NOT NULL, "filters" VARCHAR(4096) ENCODE ZSTD, "filters" VARCHAR(4096) ENCODE ZSTD, "page_results" INT ENCODE ZSTD, "total_results" INT ENCODE ZSTD, FOREIGN KEY (root_id) REFERENCES atomic.events(event_id) ) DISTSTYLE KEY DISTKEY (root_id) SORTKEY (root_tstamp);

COMMENT ON TABLE atomic.com_snowplowanalytics_snowplow_site_search_1 IS 'iglu:com.snowplowanalytics.snowplow/site_search/jsonschema/1-0-0'; root@4a755f2db42b:/out/sql#



I verified version: 0.8.0 & current master branch  
it works before on 0.6.0 version 
chuwy commented 2 years ago

Thanks for the very detailed report, @rkorszun! This is an issue in the underlying Schema DDL library: https://github.com/snowplow/schema-ddl/issues/55. Although we didn't even know it was working on 0.6.0 before.

rkorszun commented 2 years ago

Thanks for the info, I will try to look into schema-ddl