mozilla / jsonschema-transpiler

Compile JSON Schema into Avro and BigQuery schemas
Mozilla Public License 2.0
42 stars 10 forks source link

Add support for tuples in the form of anonymous structs #89

Closed acmiyaguchi closed 4 years ago

acmiyaguchi commented 4 years ago

This should fix #38 by adding support for anonymous structs via tuple validation in the form of a --tuple-struct flag. This requires a custom JSON decoder that maps JSON lists into structs given the anonymous struct naming convention (f0_, f1_, ...f{n}_).

I've tested this against the latest version of mozilla-pipeline-schemas. There are no regressions between this PR and the latest tag (v1.4.1) when leaving off --tuple-struct. Also, cursory glance of the diff seems show the behavior described in #36.

See https://gist.github.com/acmiyaguchi/b1ee9a93f17f605995453251b7e34316

acmiyaguchi commented 4 years ago

After reading through the schemas again, the fields look incorrect since they are not repeated fields.

acmiyaguchi commented 4 years ago

I was a bit hasty with my evaluation, but better safe than sorry. I was misreading the event structure, which is actually composed of repeated elements for each process. It looks like this was doing the right thing after-all. I've modified the testing avro encoder to map lists into objects for processing the newly added fields, which I've fed into BigQuery.

The following query works as expected:

select content.* except(f5_), f5_.*
from test_avro.telemetry__event_v4,
unnest(payload.events.content) content,
unnest(content.f5_) f5_

with subschema: image


Row | f0_ | f1_ | f2_ | f3_ | f4_ | key | value |  
-- | -- | -- | -- | -- | -- | -- | -- | --
1 | 36110789 | security.ui.certerror | load | aboutcerterror | MOZILLA_PKIX_ERROR_ADDITIONAL_POLICY_CON | has_sts | false |  
2 | 36110789 | security.ui.certerror | load | aboutcerterror | MOZILLA_PKIX_ERROR_ADDITIONAL_POLICY_CON | is_frame | true |  
3 | 11153 | security.ui.certerror | load | aboutcerterror | SEC_ERROR_UNKNOWN_ISSUER | has_sts | false |  
4 | 11153 | security.ui.certerror | load | aboutcerterror | SEC_ERROR_UNKNOWN_ISSUER | is_frame | false |  
5 | 40531 | security.ui.certerror | load | aboutcerterror | MOZILLA_PKIX_ERROR_NOT_YET_VALID_ISSUER_ | has_sts | true |  

...
acmiyaguchi commented 4 years ago

I've added some new tests and fixed an issue with an object nested within the tuple. This should be ready to deploy.