nodestream-proj / nodestream

A Declarative framework for Building, Maintaining, and Analyzing Graph Data
https://nodestream-proj.github.io/docs/
Apache License 2.0
37 stars 11 forks source link

[BUG] `do_lowercase_strings` is being applied to all property values by default #277

Closed bechbd closed 2 months ago

bechbd commented 7 months ago

Describe the bug All the string properties are being lowercased due to the do_lowercase_strings normalization being applied

To Reproduce Given the configuration below:

- implementation: nodestream.interpreting:Interpreter
  arguments:
    interpretations:
      - type: source_node
        node_type: Artist
        key:
          id: !jmespath nconst
        properties:
          name: !jmespath primaryName

All the name properties are being lowercased due to the do_lowercase_strings normalization being applied. This is an unexpected behavior from a CX perspective.

Expected behavior The string values are not changed

Additional context This also happens to key properties but I see the logic here in lowercasing all key values, to ensure consistency in key lookups so that does not need to change.

zprobst commented 7 months ago

I think the solution here could be something like this:

  1. Introduce key_normalization and property_normalization fields and deprecate normalization.
  2. Keep the default of key_normalization to be what normalization currently is.
  3. Have the default of property_normalization be blank.
  4. If normalization is set, then apply it to both taking precedence.
  5. Error if both normalization and one of key_normalization or property_normalization.
zprobst commented 2 months ago

This issue has been resolved and will be released with 0.13.