mrufsvold / Waluigi.jl

MIT License
22 stars 4 forks source link

provide non-macro interface #8

Closed Moelf closed 1 year ago

Moelf commented 1 year ago

macro is all cool and nice, but maybe a non-macro interface would benefit many user interfaces, it also avoid having a single global collection of jobs?

I can imagine something like:

JobDefinitions.build_pipeline(Dict("GetGreeting" => job1, "GetAddressee"=> job2...)) |> Waluigi.run_pipeline
Moelf commented 1 year ago

I'm just thinking it would be nice to have a non-macro interface, I mean, imagine user needs to ingest a JSON or something that specify a workflow, maybe from a different language or framework, relying 100% on macro would make this kind of usage impossible

mrufsvold commented 1 year ago

So currently, the macro defines a job type add dispatches get_dependencies, get_target and run_result. Those types can get wrapped in a module of JobDefinitions (or whatever). Then, those types are instantiated with various parameters.

It sounds like you're suggesting bumping this up a level of abstraction and having a concrete Job type that has fields for these functions. The user would define an instance of Job and then the pipeline could call 'userjob.get_dependencies()`. Then you could pass in a collection of jobs to be done.

The reason I went this direction is a couple fold

1) We need a way for a job to hold parameters. In the current system, the new Job type has fields and those fields can be typed (so things stay type stable.) Without defining these new types, we'd need our Job constructor to hold an untyped field for input parameters. That would hurt performance.

2) We need to store the instructions for the various functions I listed above. If we store them as functions in a field on the job, then it's type instable because Function is an abstract type.

3) I am able to use the the instances of Job types as functors, so you get the name of the job in the stack trace when you hit an error.

Please let me know if I'm misunderstanding you or if you think the tradeoffs of the design aren't worth it. I'm open to new ideas!

Edit: to be clear, once you've defined a type of job, the instances of it are not in global scope. The macro is really just a convenience wrapper around

struct MyJob<: Waluigi.AbstractJob
    param1
end

function get_dependencies(job::MyJob)
    deps = let param1 = job.param1
        #steps
    end
    # checking for correct return types
    return deps
end

# other functions 
mrufsvold commented 1 year ago

Sorry that my comment got deleted and reposted. I'm on mobile, and it's acting wonky.

Were you able to see the edit I added to my comment above?

The macro is for defining types and functions efficient, which the user can absolutely do manually. The user couldn't really define types and functions based on input from a JSON file (not safely at least) with or without the macro API.

The user absolutely COULD define some job types and then instantiate them with parameters from JSON.

I can imagine something like

yaml:

user: xxx
pass: xxx
files_to_load:
  - file.csv
  - file2.csv

Julia file:

@Job begin
    name = load_to_db
    params = (file_path, db_user, db_pass)
    process = ...
end

@Job begin
    name = load_all_files
    params = (file_paths, db_user, db_pass)
    dependencies= [
        load_to_db(file, db_user, db_pass)
        for file in file_paths
    ]
end

config = load("my/config/file.yaml")

load_tables = load_all_files(
    config["files_to_load"],
    config["user"],
    config["pass"]
)

load_tables |> run_pipeline
mrufsvold commented 1 year ago

Does that satisfy what you're looking for? If you have a more concrete idea for changes or additions to the API, I'm open to it, but I'd also don't want to leave this dangling. Let me know if you want to close!

Moelf commented 1 year ago

I'm just trying to throw in some ideas at early stage, in the end it's your call, but I think in general having a non-macro 1st class API is nice just because then people can programmatically define jobs

mrufsvold commented 1 year ago

Hey, @Moelf! Would you take a peek at pr #17 and see how you feel about it?