quantified-uncertainty / metaforecast

Fetch forecasts from prediction markets/forecasting platforms to make them searchable. Integrate these forecasts into other services.
https://metaforecast.org/
MIT License
56 stars 5 forks source link

Fix for upserts, question pages (WIP), history #71

Closed berekuk closed 2 years ago

berekuk commented 2 years ago

So this is a mix of several changes, some are unfinished:


Fix for upserts - kinda urgent

I was working on adding foreign keys from history to questions table when I discovered that fetchers were broken in prod due to the unfortunate combination of: (1) frontpage table is now implemented with foreign keys to questions table now; (2) fetched questions are inserted with delete+recreate trick.

But foreign keys forbid the questions from being deleted, so deletes failed, and cron job skipped the platform entirely if it had any questions on the frontpage. At least that's what I assume have happened, I haven't checked the full logs.

defaultdb=> select date(timestamp) as d, count(1) from questions group by d order by d desc limit 5;
     d      | count
------------+-------
 2022-04-27 |  1805
 2022-04-25 |   191
 2022-04-23 |   110
 2022-04-22 |  2189
 2022-03-30 |   168
(5 rows)

What's worse is that there's no easy to fix this while keeping the same performance. The only fix I could come up with quickly negates the benefits of #30: in this PR I just delete the questions which don't exist any more, create the new ones and update the ones which existed. But there's no bulk update API in Prisma (or in SQL, actually), so all updates are done in separate SQL queries. Or, to be more precise, in 5 queries per question:

prisma:query BEGIN
prisma:query SELECT "public"."questions"."id" FROM "public"."questions" WHERE "public"."questions"."id" = $1
prisma:query UPDATE "public"."questions" SET "extra" = $1, "timestamp" = $2, "title" = $3, "url" = $4, "platform" = $5, "description" = $6, "options" = $7, "qualityindicators" = $8, "id" = $9 WHERE "public"."questions"."id" IN ($10)
prisma:query SELECT "public"."questions"."id", "public"."questions"."title", "public"."questions"."url", "public"."questions"."platform", "public"."questions"."description", "public"."questions"."options", "public"."questions"."timestamp", "public"."questions"."qualityindicators", "public"."questions"."extra" FROM "public"."questions" WHERE "public"."questions"."id" = $1 LIMIT $2 OFFSET $3
prisma:query COMMIT

It shouldn't be that bad, though, since our DB is now geographically close to Heroku's instance. And this can be optimized further:


On other stuff in this PR:

vercel[bot] commented 2 years ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
metaforecast ✅ Ready (Inspect) Visit Preview Apr 27, 2022 at 6:29PM (UTC)
berekuk commented 2 years ago

Ok, I'll just merge it.

@NunoSempere, let me know if you're not ok with this:

only the question title links to the original url now

(instead of an entire question card)

NunoSempere commented 2 years ago

Hey, this looks good to me