We currently only have 2 types of operation on smooshr
Combine columns together
Create a taxonomy for a given column
In the future we would like to have more steps for example
Extract part of a column as a new column. For example an address like "23 Some Street, Some City, US, 11221" -> "Some City" to
Standardize a time column
Merge the contents of two columns together to form a new column
Do entity matching on a given column
etc
Some of these steps will have dependencies on previous steps that are hard to predict at run time. It would be great to have each indiividual transform be defined as a node in a graph with dependecies linked by edges. Essentially a DAG.
This would inform the UI and the python code that is ultimetly spit out by the tool.
Some links to projects that might be worth looking at
We currently only have 2 types of operation on smooshr
In the future we would like to have more steps for example
Some of these steps will have dependencies on previous steps that are hard to predict at run time. It would be great to have each indiividual transform be defined as a node in a graph with dependecies linked by edges. Essentially a DAG.
This would inform the UI and the python code that is ultimetly spit out by the tool.
Some links to projects that might be worth looking at