supabase-community / seed

Automatically seed your database with production-like dummy data based on your schema for local development and testing.
MIT License
483 stars 18 forks source link

fix: Wait for data generation jobs correctly #164

Closed justinvdm closed 6 months ago

justinvdm commented 6 months ago

Problem 1

On init, we would wait indefinitely for data generation jobs to start. For new or empty (no data sets) projects, jobs would eventually start. For existing projects, there would only be new jobs started if the db changed. If there were no new jobs, the user would keep waiting indefinitely.

For example, if the user ran init, let it finish, then ran init again, they would wait indefinitely. Similarly, if the user ran init, then skipped out of waiting, then ran init again, chances are the jobs completed in the meantime, and they would now be waiting indefinitely (the jobs completed before the waiting started).

These might sound like edge cases on first blush, but I think it is quite likely that users will sometimes re-run the command.

This indefinite waiting is a recent thing (this commit I made), but before it, we had an arguably worse issue: we'd wait until a given time for jobs to start and then proceed regardless if the waiting time was exceeded. If we had enough prediction jobs in our ingest queues, this would cause users to never actually get the AI results and instead get the fallbacks (even though we set the expectation they wold get the AI results) - even though they would have waited 30s for no benefit.

Problem 2

On sync, we would skip this waiting and assume there are jobs already in the queue to wait for. This would not be the case if there are new fields that have since been added to the db. As a result, the user would end up not getting AI results for these new db fields.

Solution

Only wait for jobs to start if this is an empty project (no data sets), or if there are new inputs to send to the API since the last prediction results (i.e, new columns have been added to the db). Use this for both init and sync.

Note that there are still edge cases: it could be that the user has switched from one non-empty project to another, and even though the inputs haven't changed, there are still jobs that would start. I think this is enough of an edge case that the solution is still worth it - it solves for the more likely cases. We do need a more robust fix, but my vote is that happens when we remove the shape prediction feature (the "old AI stuff").