riverqueue / river

Fast and reliable background jobs in Go
https://riverqueue.com
Mozilla Public License 2.0
3.35k stars 89 forks source link

More advanced querying on JobListParams.Metadata() #570

Open arp242 opened 2 weeks ago

arp242 commented 2 weeks ago

Right now JobListParams.Metadata() only offers querying via metadata >@ …, but I'd like to do slightly more advanced stuff. For example if the metadata contains:

{"object_id": 42}

And I'd like to list all jobs that match any of a set:

objectsUserHasAccessTo := []int{1, 5, 42}

params := NewJobListParams().Where(
    `(metadata->'object_id')::int = any(@objects)`,
    map[string]any{"objects": objectsUserHasAccessTo},
)

client.JobList(ctx, params)

I don't mind writing a patch for this; the simplest would be to just allow appending to the condition list and argument map, via the patch below. Although arguably that would give too much control (and the ability to do very stupid things).

Then again, restricting this would probably more trouble than its worth(?) The alternative is writing my own query from scratch, which allows for even more stupid stuff.

Well, let me know what you want and I'll submit a working patch with docs, tests, etc.

diff --git i/job_list_params.go w/job_list_params.go
index ffdc39a..7edd6f9 100644
--- i/job_list_params.go
+++ w/job_list_params.go
@@ -174,6 +174,8 @@ type JobListParams struct {
        sortField        JobListOrderByField
        sortOrder        SortOrder
        states           []rivertype.JobState
+       whereSQL         string
+       whereArgs        map[string]any
 }

 // NewJobListParams creates a new JobListParams to return available jobs sorted
@@ -263,6 +265,12 @@ func (p *JobListParams) toDBParams() (*dblist.JobListParams, error) {
                conditions = append(conditions, `metadata @> @metadata_fragment::jsonb`)
                namedArgs["metadata_fragment"] = p.metadataFragment
        }
+       if p.whereSQL != "" {
+               conditions = append(conditions, p.whereSQL)
+               for k, v := range p.whereArgs {
+                       namedArgs[k] = v
+               }
+       }

        if p.after != nil {
                if p.after.time.IsZero() { // order by ID only
@@ -346,6 +354,13 @@ func (p *JobListParams) Metadata(json string) *JobListParams {
        return paramsCopy
 }

+func (p *JobListParams) Where(sql string, args map[string]any) *JobListParams {
+       paramsCopy := p.copy()
+       paramsCopy.whereSQL = sql
+       paramsCopy.whereArgs = args
+       return paramsCopy
+}
+
 // Queues returns an updated filter set that will only return jobs from the
 // given queues.
 func (p *JobListParams) Queues(queues ...string) *JobListParams {
brandur commented 2 weeks ago

I don't mind writing a patch for this; the simplest would be to just allow appending to the condition list and argument map, via the patch below. Although arguably that would give too much control (and the ability to do very stupid things).

Then again, restricting this would probably more trouble than its worth(?) The alternative is writing my own query from scratch, which allows for even more stupid stuff.

This is roughly where I'm at now too. It's kind of a nice idea to not allow a run-any-SQL-you-want because it's safer from a footgun perspective, but at the end of the day if want to make the list API feature rich enough to do anything, we probably have to. Halfway measures will make the corner cases ever smaller, but they'll always still exist to an extent.

@bgentry Any thoughts on that?

@arp242 Depending on what stack you're using, it also might not be the worst-ever-idea to skip River's built-in job list API and just use your data access layer (e.g. sqlc) directly. Then you have as much power as you want and will never got boxed in.

arp242 commented 2 weeks ago

@arp242 Depending on what stack you're using, it also might not be the worst-ever-idea to skip River's built-in job list API and just use your data access layer (e.g. sqlc) directly. Then you have as much power as you want and will never got boxed in.

Yeah, that's pretty much what I did for the time being. It's not hard and works well enough. In some ways this is better because it's consistent with the rest of my code.

My main worry is stability: new versions of River can change the table layout and break my code. This can be easy to miss, because it won't be a compile error. River is still pre-1.0, so all bets are off.

I also hit another problem with JobList() last evening: selecting all columns is too slow, because args contains relatively a lot of data (~85K per row) resulting in >500ms response times instead of 1ms if we select all columns except args.

Implementing a full-on query builder is probably out of scope. Or at least, I'd consider that out of scope. "A slightly more advanced way to query metadata" is still okay, but "select which columns to return" is getting too much.

But like I said, compatibility is my main concern here.

brandur commented 1 week ago

My main worry is stability: new versions of River can change the table layout and break my code. This can be easy to miss, because it won't be a compile error. River is still pre-1.0, so all bets are off.

I wouldn't worry too much about this. I can't guarantee that the schema never changes, but it's not super likely to see significant changes going forward. River's main feature set is largely "done", and migrations can be a major source of pain for large tables, so we try to keep their invasiveness minimal.

Also, as long as you've got reasonable test coverage, you should be able to detect any problems in CI before an upgrade.