quantified-uncertainty / metaforecast

Fetch forecasts from prediction markets/forecasting platforms to make them searchable. Integrate these forecasts into other services.
https://metaforecast.org/
MIT License
60 stars 6 forks source link

Faster upsert #37

Closed berekuk closed 2 years ago

berekuk commented 2 years ago

PR for #30.

DELETE FROM by itself haven't brought any performance improvements, in fact, it made the code significantly slower (but it's still the right thing to do, transaction safety-wise, otherwise frontpage update could coincide with merge job and then the frontpage would be empty).

Bulk inserts are fast. In my local tests time for polymarket upserts went down from 35 seconds to 5.

PS: This PR is stacked on top of #34.

netlify[bot] commented 2 years ago

Deploy Preview for metaforecast ready!

Name Link
Latest commit 111aa8c9af3126ab0746055675a8973246e851ee
Latest deploy log https://app.netlify.com/sites/metaforecast/deploys/624490718818330009ebc2c2
Deploy Preview https://deploy-preview-37--metaforecast.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

berekuk commented 2 years ago

Ugh, npm decided to remove some stuff from my package-lock.json for some reason, I thought it was fine but it isn't, and netlify build fails now.

Upd: oh, it installed @next/swc-darwin-arm64 instead of linux version.

berekuk commented 2 years ago

Rebased on top of fresh #34 changes.

One more benchmark: on my local db (incomplete, includes only some platforms):

Inserted 1988 rows with approximate cummulative size 3 MB into latest.combined.
...
Took 18.949 seconds, or 0.3158166666666667 minutes.

On prod:

Inserted 5001 rows with approximate cummulative size 13.9 MB into latest.combined.
...
Took 330.071 seconds, or 5.5011833333333335 minutes.
berekuk commented 2 years ago

Also: this speed-up might be almost entirely due to ping distance from Heroku to DO.

I checked the ping distance with tcp-ping from Heroku to DO and it's not good:

~ $ npm i tcp-ping

[...]

~ $ node
Welcome to Node.js v16.14.2.
Type ".help" for more information.
> const {ping}=require('tcp-ping');
undefined
> ping({address: 'postgres-red-do-user-10290909-0.b.db.ondigitalocean.com', port: 25060}, (err, data) => {console.log(data)})
undefined
> {
  address: 'postgres-red-do-user-10290909-0.b.db.ondigitalocean.com',
  port: 25060,
  attempts: 10,
  avg: 92.8678741,
  max: 96.414132,
  min: 89.406278,
  results: [
    { seq: 0, time: 92.711855 },
    { seq: 1, time: 93.537151 },
    { seq: 2, time: 89.406278 },
    { seq: 3, time: 92.171002 },
    { seq: 4, time: 90.283298 },
    { seq: 5, time: 93.572547 },
    { seq: 6, time: 89.975433 },
    { seq: 7, time: 95.535087 },
    { seq: 8, time: 96.414132 },
    { seq: 9, time: 95.071958 }
  ]
}

So, for 5000 rows inserted one by one it would take... actually, 450 seconds instead 330, not sure what that's about :)

This might be the good reason to move from Heroku to DO, or at least to look into whether there's a Heroku-DO datacenter pair with lower ping distance.

Also, this might eventually cause us to move API back from Netlify closer to the DB (but still keep it implemented with Next.js); if our APIs will become more complicated and require multiple DB queries then ping distance will heavily affect its performance.

Single-query requests shouldn't be affected by this, though: Netlify probably runs its functions on edge close to the user, and user->db ping distance is unavoidable (at least without much more complicated setups with db replicas on each continent, which we surely don't want to deal with now).

NunoSempere commented 2 years ago

Yeah, I previouusly prefered to have my DO instance in Europe (data protection laws are better). But I changed it to the US in the new team database.