Closed mbommerez closed 2 years ago
We might need to split this one in 2 parts:
After some experimenting and exploring, here's a low-down and a plan:
OFFSET
for that API, but when you fetch the first 10,000 (or fewer) objects you get a cursor which allows you to offset your next query from there.fetch
method (this is the only way to talk to a DO) and another request for the DO to POST to ipns-publisher. So with a limit of 1000 subrequests we'll be able to do a max of 499 records per cron instantiation. Ouch.We could create a cron job on the ipns-publisher side, or in GH Actions, which calls the CF Worker. The CF Worker would then process some of the DOs, and would return the cursor for the DO IDs query back so that ipns-publisher/GH could then call the CF Worker again to continue where it left off.
I don't like this because (1) it's leaking the implementation detail of the DO IDs query out to another system, and (2) as it wouldn't use a CF cron job, the Worker would be limited to 30 seconds of CPU rather than the 15 minutes that a cron job has. Maybe that would be enough given the 499/998 limit, but it doesn't feel great.
We could have a CF cron job which fetches the DO IDs, and then rather than processing them directly, makes a fetch
call to the worker passing a batch of (say 995) DO IDs for it to process. Assuming that the subrequests limit doesn't cascade down, this would allow the top-level cron job to do just under 1000 batches, and then each worker request could process almost 500/1000 DOs, giving us a total capacity of 995*1000= 995,000 records. We could even extend the tree to multiple levels, allowing for greater capacity.
Better, but not wonderful. Still feels like we're fighting the limits.
We can get around all of these issues if we simply have something in CF which stores where we've got to. Using a DO would suffice. So we'd store something like:
{
"last_ran_at": "2022-07-12T11:19:17.427Z",
"reached_last_record": false,
"cursor_of_next_batch": "abc...xyz",
"worker_in_progress": false
}
We could then have a CF cron job which runs every 5 minutes/1 hour/whatever, which fetches that state object, has a look at it, and decides what to do (e.g. process next batch). This would allow us to avoid being bitten by any of the limits and without having to pass query cursors out to an external system or have a convoluted tree of tasks of tasks.
The main potential pitfall I can see here are:
cursor_of_next_batch
we'd get ourselves stuck in a loop, so we might want to store another field or two in the state to allow us to check how long we've been attempting the current cursor for.This is probably easier to implement, but easier to get ourselves stuck into a corner with.
When we create each DO we would call setAlarm()
on it, scheduling it for 12 hours' time. Each time the alarm is run, we would publish the record to ipns-publisher and then schedule the alarm for another 12 hours. Easy. The downsides are:
IMO it's between 3 and 4. The ease of solution 4 is quite compelling, but I think there's a high chance we might end up implementing 3 at some point in order to solve one of the mentioned pitfalls. So I think it comes down to whether we want to take the quick win now with the risk of doing more work later, or go for the more time consuming but more robust solution now.
Side note: We should add a lastRepublished
attribute to the IPNSRecord DO to keep track, which will be handy regardless of which solution we choose.
I talked this over with François, alarms are IMO a bit risky - I don't know what happens when you migrate the DO to a new version - all existing instances and (I assume) alarms would be cleared out an we'd need a way to reinstantiate.
I'm not super keen on cron state but I do believe a cron is the simpliest thing that we can do to get this working that would allow us to publish around 5,000,000 IPNS keys (which should keep us going for a while). So specifically:
You get 1,000 subrequests, so you can do \~5,000,000 records (1,000 * 10,000 / 2). Could could squeeze more out by batching up more than the page size.
Ok so scratch this I think you can't access durable object storage in this way - it's also not shared by all objects.
Ok so why don't we do 4 (alarms) but first we need to:
If they don't then we need:
GET /name/:key
to get the current IPNS recordPOST /name/:key
with the current record for each recordI can confirm:
I tested this with a DO that updates a counter every minute. After a class migration and deploying new versions, the counter was never reset to zero.
It was also possible to cancel an alarm by deploying a new alarm handler that did not reschedule an alarm and did no operation.
export class AlarmCounter2 {
state: DurableObjectState
env: Env
constructor (state: DurableObjectState, env: Env) {
this.state = state
this.env = env
}
async fetch (request: Request) {
const currentAlarm = await this.state.storage.getAlarm()
const alarmIsActive = Boolean(currentAlarm)
if (currentAlarm === null) {
await this.state.storage.setAlarm(Date.now() + 60 * SECONDS)
}
const value: Number | undefined = await this.state.storage.get('value')
const data = {
alarmIsActive,
value,
version: 3
}
return jsonResponse(JSON.stringify(data), 200)
}
async alarm () {
const value: number | undefined = await this.state.storage.get('value')
if (value === undefined) {
await this.state.storage.put('value', 0)
} else {
await this.state.storage.put('value', value + 1)
}
await this.state.storage.deleteAlarm()
await this.state.storage.setAlarm(Date.now() + 60 * SECONDS)
}
}
Context:
w3name will be rebroadcasting IPNS requests every 24 hours to the DHT, so users don't have to do it.
Scope of this ticket:
Acceptance criteria: