Closed acmiyaguchi closed 5 years ago
I've updated the mozilla-pipeline-schema scripts to easily check the diff between different transpiler options.
I've created a diff of the --normalize-case
option: https://gist.github.com/acmiyaguchi/3f526c440b67ebe469bcb6ab2da5123f
$ scripts/mps-generate-schemas.sh bq1 --type bigquery --resolve drop
...
80/132 succeded
$ scripts/mps-generate-schemas.sh bq2 --type bigquery --resolve drop --normalize-case
...
80/132 succeded
$ diff -q bq1/ bq2/
Files bq1/coverage.coverage.1.schema.json and bq2/coverage.coverage.1.schema.json differ
Files bq1/eng-workflow.hgpush.1.schema.json and bq2/eng-workflow.hgpush.1.schema.json differ
Files bq1/firefox-launcher-process.launcher-process-failure.1.schema.json and bq2/firefox-launcher-process.launcher-process-failure.1.schema.json differ
Files bq1/mozdata.event.1.schema.json and bq2/mozdata.event.1.schema.json differ
...
$ diff -q bq1/ bq2/ | wc -l
45
$ diff bq1/ bq2/ > normalize_case.diff
There are a couple of interesting cases from the diff that I want to highlight:
l2cacheKB -> l2cache_kb
speedMHz -> speed_m_hz
D2DEnabled -> d2d_enabled
DWriteEnabled -> d_write_enabled
activeGMPlugins -> active_gm_plugins
@badboy This PR has changed a bit from the last review, so I'm retagging you for review. We're looking to have a consistent implementation of snake-casing across the transpiler and ingestion, so I reimplemented the logic using regular expressions and string manipulation instead. This still maintains the same output as heck, but is portable to java and python3.
I've added 3 separate test cases to ensure that the behavior stays the same:
I also did the following:
regex
create in favor of onig
, a wrapper around oniguruma for lookaround supportto_snake_case
a function accessible via a public interface for testing.
This PR fixes #77 by adding a new option to
snake_case
all column names in a schema. This should be used by adding a--normalize-case
flag to the command. By default, this option is turned off.I've chosen heck as the casing library, since it seems to have the largest number of active users. It uses the
unicode_segmentation
crate to find word boundaries and performssnake_casing
consistently across mixed casing.I've refactored the code to remove extra clones and to make the order of the functions flow better when reading top-down. I also added a few comments here and there.