monarch-initiative / koza

Data transformation framework for LinkML data models
https://koza.monarchinitiative.org/
BSD 3-Clause "New" or "Revised" License
47 stars 4 forks source link

Allow external setter alternatives to `transform.yaml` #137

Open hrshdhgd opened 2 months ago

hrshdhgd commented 2 months ago

As of now, Koza assumes that an ingest project has transform.yaml located in the same directory as the code that is generating the koza_app object. I seems like attributes of this object cannot be set via code.

Example: Setting the output_dir or delimiter attribute like this:

koza_app = get_koza_app("some-ingest")
koza_app.source.config.delimiter = ","
koza_app.source.config.output_dir = "new_output" # (or koza_app.output_dir) 

OR

koza_app = get_koza_app("some-ingest", **kwargs)

cc: @caufieldjh , @justaddcoffee

kevinschaper commented 2 months ago

Koza's architecture has a baked in separation of concerns that lets the transform.py really only transform individual rows and pass them to the writer, but everything about the configuration has already happened by the time transform.py runs. The yaml is where the process starts, not the python code.

As the issue is, all that I can say is, "correct, you can't set those things in the transform.py, it's too late" but I bet that we can figure out a solution to the actual problem that fits into Koza's architecture, or handle it before or after Koza using other tooling.

justaddcoffee commented 2 months ago

Okay, thanks @kevinschaper that makes sense

I guess separating the concern of config parsing also has the advantage that all the config stuff lives in one place (the config file), which makes it easier to see how the transform was done