straight-shoota / taskmaster

common API for using background jobs in Crystal
MIT License
5 stars 0 forks source link

Proposal for a standardized API for background job queues #1

Open straight-shoota opened 5 years ago

straight-shoota commented 5 years ago

There are a number of different shards providing queued background jobs, but it can be difficult to composite jobs from different shards when they're depending on different job queue implementations.

I've put together this shard as a proposal for a standardized API, similar to Active Job for RoR. Details and an overview in the README.

What do you think? Any feedback is welcome.

(pinging maintainers of backend shards) /cc @bmulvihill @calebuharrison @robacarp @vladfaust @microspino @mperham

vladfaust commented 5 years ago

Hello, @straight-shoota,

Thanks for your effort. I like the idea. I like the fact that you want to implement Inline, Test and Async adapters, it makes sense.

However, I don't like the particular Taskmaster and Taskmaster::Job APIs you're proposing. In my opinion, a Job should be a plain module without class properties and only abstract def call method (call makes it possible to enqueue Procs instead of Jobs). It should not know anything about the environment, therefore placing def perform_later(**options) and class_property? queue_adapter within is a mistake.

Thus said, Taskmaster module has a bad naming IMO as well, because it's more like a Manager which works with plain Job objects or Procs. I'd rather made implementing shards include this Manager module into their own manager implementations instead of relying onto adapters.

I think this idea needs another naming. How about GenericJob::Job, GenericJob::Manager and GenericJob::Managers::Inline with this example:

class MyJob
  include Shard::Job

  def call
  end
end

manager = Shard::Manager.new
manager.enqueue(MyJob.new)

manager = GenericJob::Managers::Inline.new
manager.enqueue(MyJob.new)
straight-shoota commented 5 years ago

Thanks for your feedback.

Well, naming is hard 😃 Let's not bother about that for now. I'm definitely open to changing it, but I don't think it should not be a priority. Let's keep taskmaster as a working title for now.

I don't follow the suggestion to enqueue Procs. How would that even work? How would you identify a proc job? And how could it be serialized and recreated in a worker?

#perform_later allows the individual job instance to control how it is enqueued. Generally, job.perform_later should be equivalent to Taskmaster.enqueue(job). But the job instance might want to specify options such as which queue to use.

The reason for calling that method on the job instance is that it decouples the callsite completely from underlying implementation. As long as the job class implements #perform_later, you don't need to care whether it even uses the taskmaster manager API to enqueue.

bmulvihill commented 5 years ago

@straight-shoota I like the idea of a standardized API, I have had fairly good success with ActiveJob in the past. It allowed me to write my jobs without worrying about a specific queue implementations (and change the queue implementation later), which is a big benefit.

vladfaust commented 5 years ago

For a moment I thought that Procs should be useful for apps which don't rely on 3-rd party back-ends (i.e. Redis) and execute all the code inside a single process. But then I realized that it's hardly testable and Job objects are better.

perform_later allows the individual job instance to control how it is enqueued.

No, a job instance should not know anything about how it is enqueued, i.e. what's happening outside itself.

robacarp commented 5 years ago

There's merit to this idea. Providing some way of making library agnostic background jobs makes sense.

But what about when I decide that, as a library developer, this background job needs to be run using a feature that TaskMaster doesn't implement? Is the benefit to various libraries is lost when you must conform to a standardized API?

For example, mosquito implements a chronly task scheduler, but maybe taskmaster doesn't. Currently, If I as a library writer want to use this feature, I now have to buck the trend of using TaskMaster and specifically recommend that someone use mosquito. It will be TaskMaster's responsibility to enforce either homogeneity across all schedulers or to attempt to genericize. Both make it difficult for new ideas to be contributed.

I question the premise:

Without such a common API it would be difficult to combine shard A depending on job queue X and shard B depending on job queue Y.

Why is shard A depending on any specific queue in the first place? That seems shortsighted. Shouldn't it just provide a method which does the work and let the shard-user decide how to call it? Perhaps they even strongly suggest that the method be called in some async manner, be it spawn {} or some other way.

straight-shoota commented 5 years ago

I follow your argument about specialized features vs. generalized interface. It's difficult. And maybe not worth the effort after all. I guess we'd need to look at some actual use cases. I personally don't have too much experience in this matter, I simply had the wish to use an API that works without settling on a specific executor implementation and is great to be used in specs.

Shouldn't it just provide a method which does the work and let the shard-user decide how to call it?

Yes, in this regard, the solution is rather simple. The tricky part is the other end of the line: When some kind of "job" is supposed to be scheduled from within a shard's code. What should it do? It can either call a specific job queue API directly or in some way communicate this job to the shard user code and let it decide what to do. This could be implemented as some kind of custom callbacks or channels, or a generalized interface like the one proposed here.

straight-shoota commented 5 years ago

Oh, and besides declaring an actual standardized interface, it would already be helpful for interoperability to discuss good practices for basic design principles. For example, job data being stored in the instance vs. provided as arguments to a perform method. Or job serialization in a standard format (cf robacarp/mosquito#29).