Closed zlace0x closed 1 year ago
Hey @zlace0x, thanks for the report. I'm able to reproduce it. It's related to tiered storage at Upstash and illegal(?) usage of EVALSHA in bull
library.
First a bit background... Upstash has a tiered storage Multi Tier Storage which only keeps hot entries in memory. After some time idle keys are evicted from memory. In this case, delayed job keys are evicted from memory.
bull
library uses EVAL / EVALSHA
to insert/get/update multiple keys & data structures. Normally one should provide the specific keys used in Lua script explicitly to the EVAL / EVALSHA
commands. See following statement from https://redis.io/commands/eval:
All Redis commands must be analyzed before execution to determine which keys the command will operate on. In order for this to be true for EVAL, keys must be passed explicitly. This is useful in many ways, but especially to make sure Redis Cluster can forward your request to the appropriate cluster node. Note this rule is not enforced in order to provide the user with opportunities to abuse the Redis single instance configuration, at the cost of writing scripts not compatible with Redis Cluster.
For the OSS Redis, EVAL works even when keys are not passed when running in standalone mode (when not clustered). For Upstash, this is an illegal case, regardless of being in replicated (multizone or global) or not (standalone), Upstash always uses tiered storage and loads the keys into memory according the given keys. When a EVAL accesses a key inside the script without explicitly providing the key, in current implementation Upstash cannot load the key if it is in cold storage.
So, in your case, bull
tries to update a HASH belonging to a job with a Lua script without passing the key. The reason for bull
is, it fetches the related keys in the same script dynamically. See this script: https://github.com/OptimalBits/bull/blob/master/lib/commands/updateDelaySet-6.lua. It queries the jobs
and then calls HSET ..
for each jobId
.
I'll discuss this with the team, how can we solve this or at least can provide a workaround.
@mdogan just chiming in as Bull/BullMQ maintainer. I would like to help in fixing this issue, maybe we can find a workaround together with a combination of improvements in both upstash's Redis implementation and in the libraries I maintain. Feel free to contact me privately if you feel you need to: manast@taskforce.sh.
I would like to use Upstash with Bull as well. I'm able to queue jobs, but processing seems to be an issue.
Closing in favor of #18
While using upstash + bull to process delayed jobs, we encountered weird bugs where job.data goes missing, happens randomly for 75% of all jobs. Unable to find root cause/issues on bull side. The timestamps & data section on taskforce takes a while to reflect on UI.![image](https://user-images.githubusercontent.com/81418809/148191437-dddf274b-e648-4c99-b013-b3c1fd03e0ed.png)
Relevant code:
Versions: Bull 4.1.1 & Bull 4.2.0
Switching to local redis or redis lab fixes this without any code changes.
Possible cause: 1) bull job data not propagating in time?
2) upstash vs redis set timestamp keys stored differently?
![image](https://user-images.githubusercontent.com/81418809/148186225-1d0b72f4-ed07-4093-946b-6dc9a105b417.png)