Closed Moelf closed 1 year ago
I'm just thinking it would be nice to have a non-macro interface, I mean, imagine user needs to ingest a JSON or something that specify a workflow, maybe from a different language or framework, relying 100% on macro would make this kind of usage impossible
So currently, the macro defines a job type add dispatches get_dependencies
, get_target
and run_result
. Those types can get wrapped in a module of JobDefinitions
(or whatever). Then, those types are instantiated with various parameters.
It sounds like you're suggesting bumping this up a level of abstraction and having a concrete Job type that has fields for these functions. The user would define an instance of Job
and then the pipeline could call 'userjob.get_dependencies()`. Then you could pass in a collection of jobs to be done.
The reason I went this direction is a couple fold
1) We need a way for a job to hold parameters. In the current system, the new Job type has fields and those fields can be typed (so things stay type stable.) Without defining these new types, we'd need our Job constructor to hold an untyped field for input parameters. That would hurt performance.
2) We need to store the instructions for the various functions I listed above. If we store them as functions in a field on the job, then it's type instable because Function
is an abstract type.
3) I am able to use the the instances of Job types as functors, so you get the name of the job in the stack trace when you hit an error.
Please let me know if I'm misunderstanding you or if you think the tradeoffs of the design aren't worth it. I'm open to new ideas!
Edit: to be clear, once you've defined a type of job, the instances of it are not in global scope. The macro is really just a convenience wrapper around
struct MyJob<: Waluigi.AbstractJob
param1
end
function get_dependencies(job::MyJob)
deps = let param1 = job.param1
#steps
end
# checking for correct return types
return deps
end
# other functions
Sorry that my comment got deleted and reposted. I'm on mobile, and it's acting wonky.
Were you able to see the edit I added to my comment above?
The macro is for defining types and functions efficient, which the user can absolutely do manually. The user couldn't really define types and functions based on input from a JSON file (not safely at least) with or without the macro API.
The user absolutely COULD define some job types and then instantiate them with parameters from JSON.
I can imagine something like
yaml:
user: xxx
pass: xxx
files_to_load:
- file.csv
- file2.csv
Julia file:
@Job begin
name = load_to_db
params = (file_path, db_user, db_pass)
process = ...
end
@Job begin
name = load_all_files
params = (file_paths, db_user, db_pass)
dependencies= [
load_to_db(file, db_user, db_pass)
for file in file_paths
]
end
config = load("my/config/file.yaml")
load_tables = load_all_files(
config["files_to_load"],
config["user"],
config["pass"]
)
load_tables |> run_pipeline
Does that satisfy what you're looking for? If you have a more concrete idea for changes or additions to the API, I'm open to it, but I'd also don't want to leave this dangling. Let me know if you want to close!
I'm just trying to throw in some ideas at early stage, in the end it's your call, but I think in general having a non-macro 1st class API is nice just because then people can programmatically define jobs
Hey, @Moelf! Would you take a peek at pr #17 and see how you feel about it?
macro is all cool and nice, but maybe a non-macro interface would benefit many user interfaces, it also avoid having a single global collection of jobs?
I can imagine something like: