quantified-uncertainty / metaforecast

Fetch forecasts from prediction markets/forecasting platforms to make them searchable. Integrate these forecasts into other services.
https://metaforecast.org/
MIT License
60 stars 6 forks source link

[Umbrella] New DB layer #33

Closed berekuk closed 2 years ago

berekuk commented 2 years ago

"Merge platform tables" is the new one here. I think it would be better if we just stored all forecasts in a single forecasts (currently combined) table.

There's also a possible further roadmap with normalizing the database structure, i.e. extracting most of JSON fields into separate tables. But that can be done later in a separate issue.

Time estimate: uh, 5-10 hours, I guess? Maybe less. Prisma client might take a bit more time, but all these steps seem quite straightforward to me.

NunoSempere commented 2 years ago

Sounds good as well, I'm ok defering to you here.

berekuk commented 2 years ago

One more thing: I'd like to change the platform field values in the DB and other related code (e.g. stars.ts) to the short platform name/id, e.g. "goodjudgment" instead of "Good Judgment", and leave longer titles just for displaying. This is useful for consistency, for type safety, etc.

I plan to do (1), (2), (3), (5) from the list above in a single PR, and add Prisma features later.

I also want to do this in a single data migration to avoid dealing with multiple small migrations.

Doing it all with zero downtime is a bit tricky, here's my current plan:

All the moving parts (checklist for myself and for documenting the upcoming changes):

NunoSempere commented 2 years ago

Could you also fix https://github.com/QURIresearch/metaforecast/issues/41 while you are at it?

For algolia, searching for platforms seems useful to not break. One possible approach is to have different fields for algolia, which would be added in the part of the code that pushes to algolia.

berekuk commented 2 years ago

Could you also fix https://github.com/QURIresearch/metaforecast/issues/41 while you are at it?

Yes!

One complication there is dashboards.

I'll update dashboards.contents on migration (already done in my branch, will PR soon), but I can't update dashboard ids without breaking user-facing pages.

Which means that for some legacy dashboards their ids won't match the hash of contents anymore. I think that's not a problem, though, just something to keep in mind for the future, we might shoot ourselves in the foot some time later if we forget about this.

For algolia, searching for platforms seems useful to not break.

Didn't intend to break it permanently in any case :) but there are some tradeoffs here between "break platforms search for <1h period" (until reindexing) and "spend 1-2 hours more doing a more careful migration".

I'll go with the latter, though, it shouldn't be difficult to create a new index and switch to it in the new code, then old code will match with the old format and new code with the new one.

But it will require documenting/automating how algolia index is created (all the fields which are currently set up through algolia web ui, but could be set up with a script instead).

berekuk commented 2 years ago

One possible approach is to have different fields for algolia, which would be added in the part of the code that pushes to algolia.

Actually, you're right, that's easier.

Quick question: is makeCompatibleWithFuse and item/score containers in frontend code important? I can't find any parts of code which would set the score, it's set to zero everywhere, and I'm not sure if it's legacy which can be removed or if it's some kind of unfinished effort for later.

berekuk commented 2 years ago

Found the answer in https://www.lesswrong.com/posts/5hugQzRhdGYc6ParJ/metaforecast-update-better-search-capture-functionality-more

Initially, Metaforecast used a custom search script on top of Fuse.js, an open source fuzzy-search library. This was simple to implement, but resulted in a search that was fairly slow and suboptimal. We switched to Algolia [...]

I guess it can be removed then.

NunoSempere commented 2 years ago

Yes, it can be removed. It's a bit tricky because it was integrated throughout the code, though.

berekuk commented 2 years ago

Everything except for prisma is done and deployed. (Total time: around 9 hours, but I did other stuff from #36 in the same update)

We have just 4 tables in the database now, so Prisma Migrate should be easy. Querying with Prisma Client will be a bit more difficult (and will probably reveal that DB should be factored/normalized further).