ysv-rs / ysv

ysv: clean and transform CSV data along your rules encoded in YAML format, lightning fast
4 stars 1 forks source link

Implement a transformation with if-else semantics #24

Open anatoly-scherbakov opened 4 years ago

anatoly-scherbakov commented 4 years ago

Example usecases:

...and whatever. We need to come up with:

anatoly-scherbakov commented 4 years ago

Syntax 1

version: 1
columns:
  phone:
    - input: phone
    - match?: "\d{10}"
    - prepend: "1"
    - else:
      - value: ""

If the phone number matches specified pattern, we preprend 1. Otherwise, we replace it with an empty value.

The difficulty here is as follows. match?: "\d{10}" should return a boolean value, but the prepend operation obviously requires the string value obtained from input. This is not clear how to model such a thing monadically.

(Via typing, probably.)

anatoly-scherbakov commented 4 years ago

Syntax 2

version: 1
columns:
  phone:
    - input: phone
    - if:
      conditon:
        - match: "\d{10}"
      then:
        - prepend: "1"
      else:
        - value: ""

Kind of verbose, but more understandable.

anatoly-scherbakov commented 4 years ago

Syntax 3

I am trying to express things monadically here.

version: 1
columns:
  phone:
    - input: phone
    - match: "\d{10}"
    - map:
      - prepend: "1"
    - fix:
      - value: ""

I am afraid this is immensely verbose and will mean that every step in the algorithm must be tagged with a map, fix or something else.

I believe that, generally, every step in the transformations chain can be assumed to be a .map(). Is it not? How then to mark steps which are only executed when the value is an error?

anatoly-scherbakov commented 4 years ago

Syntax 4 (impossible)

version: 1
columns:
  phone:
    - input: phone
? - match: "\\d{10}"
:
 - then:
   - prepend: "1"
 - else:
   - value: ""

I tried to use the composite key syntax, but it only works at https://onlineyamltools.com/convert-yaml-to-json if the ? has zero indentation level. This will not work.

anatoly-scherbakov commented 4 years ago

Syntax 5 (actually 2 but improved)

version: 1
columns:
  phone:
    - input: phone
    - match?:
      pattern: "\d{10}"
      then:
        - prepend: "1"
      else:
        - value: ""

This is less verbose, but more specialized than the if - then - else scenario. We are going to have multiple conditionals like this:

etc.

But this will also make the language more readable.

P. S. Will it? Say you need to compare the length of the string. Will you create separate functions for this, like

I can see such a feature being very useful in multiple contexts of cleaning data.

anatoly-scherbakov commented 4 years ago

Syntax 6

version: 1
columns:
  phone:
    - input: phone
    - if: {match: /\d{10}/}
      then: {prepend: "1"}
      else: {value: ""}

Shorthand syntax for a familiar if .. then .. else construct.

anatoly-scherbakov commented 4 years ago

After some thought, I'd like to postpone this implementation due to the following reasons: