kaliberjs / firebase-queue

A trimmed and more robust version of the original Firebase Queue
MIT License
20 stars 3 forks source link

Looking for suggestions on multiple 'queues' #10

Closed yagottahavehart closed 5 years ago

yagottahavehart commented 5 years ago

Note: 'queues' (in quotes) is not necessarily a firebase-queue Queue.

I've got a use case where I'm looking to set up many different 'queues'. The Tasks themselves are all essentially the same: 'Do Simple Action that takes 0-5 seconds,' and the order within the specific 'queues' matter. The Tasks should be picked up as soon as possible and should not be slowed down by Tasks of other 'queues.' There will be a few thousand 'queues,' each with a relatively low amount of tasks pushed to them (a backup of > 25-50 would be rare as there's gating on the client side).

Setting up a different firebase-queue Queue for each of these, regardless of how lightweight they are, seems like the naive approach (I'm new to project). Is that the recommended approach, or do you have a suggestion for handling many different ordered 'queues' with this project?

Thanks

EECOLOR commented 5 years ago

The Queue from this version of 'firebase-queue' is very lightweight, so I do not see any problems in having a lot of them.

The only reason you would make more queues is when you have different processTask functions (or more node processes). When you have 'long running tasks' (tasks come in faster than they can be processed) you could adjust the numTasks option. When using more workers you create a bit of overhead as they will all try to claim the next task when idle.

That said, I advise you to just try and measure things. Back in the day we used https://gatling.io/ to test the performance of our apps, this might be helpful for you as well (I haven't checked the project in years).

I'm interested though: why do you need a few thousand 'queues'? There might be another solution to your problem that would have different trade-offs.

yagottahavehart commented 5 years ago

Hey, thanks for the response. I'm thinking there's definitely a different/better solution to my issue, I'll see if I can note down the important parts below.

Edit: Queue workers will be on 2-10 node processes at a time (though it can be limited).

A quick example would be I have 5000 users. Each user can have Tasks pushed to a 'queue' to be executed in order (1 at a time per user). Tasks are essentially do a quick action, then delay for the 'queue' 0-N seconds before another Task can be picked up for that specific user. Tasks may come in small bursts, but the number of them will be relatively low since they'll be prevented from firing on the client side. Tasks of one user are unrelated to tasks of another user, and delays from one shouldn't prevent processing of another.

EECOLOR commented 5 years ago

I will definitely have to think about this, a very interesting problem!

My guess is that this library (and it's intended use) is not suitable for this problem. Although practically the order of insertion if preserved when you sort of a child (_state in our case), there is no guarantee for this. The documentation even states that you can only order on single property (key, child, value).

Having said that, I would first need to think about how I would solve this problem. From there I can see if this would be a use case this library can support. If it would introduce a lot of code specifically for this use case it would probably be best not to add it to the library. If that is the case and this is a use case that is common we could think about adding to another library. Then again, this library is called 'firebase-queue' and here we are talking about an ordered queue (which seems to be a very natural thing to expect, especially when looking at real world queues).

Again, I need to think about it. Thank you for bringing this use case!

yagottahavehart commented 5 years ago

Thanks for the thoughtful response.

I Will be thinking it through on my side as well. I'll likely end up reducing some of the constraints (or solving just part of the problem) and finding a way to have it fit within firebase-queue, as I would like to get hands on with the library. If I happen on to something clever and relevant, I'll be sure to share.

Thanks again.

Edit: One question, when you 'return' a task from processTask, is it immediately retried, or is it added later in the queue?

EECOLOR commented 5 years ago

The source code of the library should be very readable and it is not that much code anymore, so feel free to download a copy and twist it to your needs. Forget about the tests and first try hacking something together that solves your problem (however inefficient the solution may be), it helps in exploring the problem space.

You could for example have a queue where each task represents the queue of a specific user. And have the worker pick the next 'user task' by iterating in javascript. For a delay on that queue you set the finished state to a particular value that is handled by another set of workers that just work with timeouts. On timeout they remove the state so that processing gets back to normal.

I'm not saying this is the way to do it, but thinking outside the box and hacking together different solutions might end up getting you closer to a nice solution.

EECOLOR commented 5 years ago

Your question let me to the second parameter of equalTo(value, key) which might come in useful as it suggest that we can have 2 levels of ordering. I have reached out to Firebase to ask about its behavior.

EECOLOR commented 5 years ago

From Firebase:

It's true that, in the event that Realtime Database detects similar values, your data will be secondarily sorted by their keys in lexicographic order. As for how the optional key parameter works, its functionality is described in this reference page. Apart from equalTo(), this optional value can also be used with startAt() and endAt(), and it allows you to use key names as a secondary filter in the event that equal values are found. You can think of this optional parameter as a last resort startAt()/endAt().

Let me know if you have any other questions regarding this.

EECOLOR commented 5 years ago

So, I have been thinking about your problem. I have been assuming that you want your queues to processed by multiple services (for performance and redundancy).

In order to describe my thoughts I have to make the distinction between a user task (the action a user wants to perform) and a queue task (the object that is passed to the processTask function). To keep the distinction clear I will use 'action' for user tasks and 'task' for queue tasks. The meaning of queue depends on the context.

If you have the constraint that only one action per user can be performed and that a user can have multiple actions in queue this means that only one task can enter the queue system. If you have more than one task it could be picked up by multiple servers making ordering a problem.

So this means that the task queue will only ever contain at most one task per user. This task could contain the queue of actions for that user or it could just reference the action queue of the user (by id or something). This however creates a tricky situation.

Let's say that processing of the current task (user actions) is put on hold after the last action, in the split second that it resumes and completes the tasks (no further actions) the client pushes a an action onto the queue. When the client looked at the queue it noticed that the task was on hold, so it assumed the action would be picked up when resumed. However, timing made so that adding the action came after resolving the task. So this would force us to keep the action queue within the task and use a transaction to add the action to the task. This transaction could then check the state of the current action (or null if it is a 'new run').

Another approach could be achieved to loosen the constraints. Instead of handling queued actions on the server, you could do the queuing (with or without storage in firebase) in the client. The client would use its uid as key for the task and adds it only when there is no task running (or on hold). In the rules you would make sure that adding a task can only be done by the client when there is no task with the auth.uid in queue.

This approach is more of a request / response approach where you could have logic like below on the client. Note that we use this code for more regular http request response stuff, you would need to tweak it to your situation:

export const TIMEOUT = Symbol('timeout')

const isProduction = process.env.NODE_ENV === 'production'

export default async function requestResponse({ ref, body }) {
  const requestRef = ref.push()

  const response = new Promise((resolve, reject) => {
    requestRef.on(
      'value',
      snap => {
        const state = snap.child('_state').val()

        if (state === 'response') {
          requestRef.off()
          const response = snap.child('response').val()
          resolve(JSON.parse(response))
        } else if (state === 'error') {
          requestRef.off()
          const error = new Error(snap.child('_error_details/error').val())
          if (!isProduction) console.error(error)
          reject(error)
        } else {
          /* ignore */
        }
      },
      reject,
    )
  })

  const timeout = new Promise((_, reject) => {
    setTimeout(() => {
      requestRef.off()
      reject(TIMEOUT)
    }, 5000)
  })

  await requestRef.set(body)

  try {
    await Promise.race([response, timeout])
    return response
  } catch (e) {
    throw e
  } finally {
    await requestRef.remove()
  }
}

I hope this helps you.

EECOLOR commented 5 years ago

Feel free to reopen this issue.