quantified-uncertainty / metaforecast

Fetch forecasts from prediction markets/forecasting platforms to make them searchable. Integrate these forecasts into other services.
https://metaforecast.org/
MIT License
57 stars 5 forks source link

Scheduler on Heroku OOMs (critical) #29

Closed berekuk closed 2 years ago

berekuk commented 2 years ago

latest.combined haven't updated since 26th:

metaforecastpg=> select date(timestamp) as d, count(1) from latest.combined group by d;
     d      | count
------------+-------
 2022-03-26 |  4807
(1 row)

Logs:

2022-03-27T11:59:04.388668+00:00 app[scheduler.4186]: ****************************
2022-03-27T11:59:04.388689+00:00 app[scheduler.4186]: polymarket
2022-03-27T11:59:04.388707+00:00 app[scheduler.4186]: ****************************
2022-03-27T11:59:04.388900+00:00 app[scheduler.4186]: Initial try
2022-03-27T11:59:18.435686+00:00 app[scheduler.4186]:
2022-03-27T11:59:18.435693+00:00 app[scheduler.4186]: <--- Last few GCs --->
2022-03-27T11:59:18.435694+00:00 app[scheduler.4186]:
2022-03-27T11:59:18.435696+00:00 app[scheduler.4186]: [4:0x4e3b870] 10716253 ms: Scavenge (reduce) 254.0 (257.6) -> 253.9 (258.4) MB, 1.9 / 0.0 ms  (average mu = 0.975, current mu = 0.863) allocation failure
2022-03-27T11:59:18.435697+00:00 app[scheduler.4186]: [4:0x4e3b870] 10716319 ms: Mark-sweep (reduce) 254.9 (258.4) -> 254.8 (259.4) MB, 57.1 / 0.0 ms  (+ 0.4 ms in 14 steps since start of marking, biggest step 0.1 ms, walltime since start of marking 126 ms) (average mu = 0.946, current mu = 0.631) allocation
2022-03-27T11:59:18.435702+00:00 app[scheduler.4186]:
2022-03-27T11:59:18.435702+00:00 app[scheduler.4186]: <--- JS stacktrace --->
2022-03-27T11:59:18.435702+00:00 app[scheduler.4186]:
2022-03-27T11:59:18.436392+00:00 app[scheduler.4186]: FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
2022-03-27T11:59:18.443246+00:00 app[scheduler.4186]: 1: 0xb09980 node::Abort() [node]
2022-03-27T11:59:18.444078+00:00 app[scheduler.4186]: 2: 0xa1c235 node::FatalError(char const*, char const*) [node]
2022-03-27T11:59:18.444720+00:00 app[scheduler.4186]: 3: 0xcf784e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
2022-03-27T11:59:18.445464+00:00 app[scheduler.4186]: 4: 0xcf7bc7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
2022-03-27T11:59:18.446186+00:00 app[scheduler.4186]: 5: 0xeaf465  [node]
2022-03-27T11:59:18.446934+00:00 app[scheduler.4186]: 6: 0xebf12d v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
2022-03-27T11:59:18.447900+00:00 app[scheduler.4186]: 7: 0xec1e2e v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
2022-03-27T11:59:18.448632+00:00 app[scheduler.4186]: 8: 0xe830a2 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
2022-03-27T11:59:18.449686+00:00 app[scheduler.4186]: 9: 0xe7b6b4 v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawWithImmortalMap(int, v8::internal::AllocationType, v8::internal::Map, v8::internal::AllocationAlignment) [node]
2022-03-27T11:59:18.450613+00:00 app[scheduler.4186]: 10: 0xe7d3c0 v8::internal::FactoryBase<v8::internal::Factory>::NewRawOneByteString(int, v8::internal::AllocationType) [node]
2022-03-27T11:59:18.451589+00:00 app[scheduler.4186]: 11: 0xfa14c9 v8::internal::JsonParser<unsigned short>::MakeString(v8::internal::JsonString const&, v8::internal::Handle<v8::internal::String>) [node]
2022-03-27T11:59:18.452739+00:00 app[scheduler.4186]: 12: 0xfa359d v8::internal::JsonParser<unsigned short>::ParseJsonValue() [node]
2022-03-27T11:59:18.453848+00:00 app[scheduler.4186]: 13: 0xfa3d2f v8::internal::JsonParser<unsigned short>::ParseJson() [node]
2022-03-27T11:59:18.454520+00:00 app[scheduler.4186]: 14: 0xd7973b v8::internal::Builtin_JsonParse(int, unsigned long*, v8::internal::Isolate*) [node]
2022-03-27T11:59:18.455444+00:00 app[scheduler.4186]: 15: 0x15f0bf9  [node]
2022-03-27T11:59:18.608389+00:00 heroku[scheduler.4186]: Process exited with status 134
2022-03-27T11:59:18.670672+00:00 heroku[scheduler.4186]: State changed from up to complete

(it also failed today due to .js/.ts misconfiguration, but I fixed that already)

I'm looking into ways to optimize the memory usage now, but we might also want to increase the dyno size on Heroku in the meantime?

NunoSempere commented 2 years ago

Increasing dyno size is ok as a hotfix, yes.

berekuk commented 2 years ago

After some investigating:

I'm running updates by hand on my machine for now (everything from 9=polymarket).

berekuk commented 2 years ago

polymarket-fetch uses around 260MB if run by itself

This number is hard to bring down for now due to how platform fetchers are currently implemented (load all data in memory and then insert everything to the DB).

So I enabled pro dynos on heroku and changed the main cron job to use Standard-2X, that should be enough. It shouldn't be as bad as $50/month just for that, since Heroku charges only for a fraction of time the dyno runs, so might be around $10/month or so.

NunoSempere commented 2 years ago

Received, seems ok for now.

NunoSempere commented 2 years ago

I reworked the polymarket fetcher. It should be significantly less memory intensive now. In particular, we were fetching all markets (including resolved ones), which wasn't necessary. I've reverted the change and we are now using free dynos, but feel free to restore them.

berekuk commented 2 years ago

Oh, great. Btw, are https://strapi-matic.poly.market/markets query parameters documented anywhere, or are you just reverse-engineering urls from https://polymarket.com/? I couldn't find anything useful on google.

NunoSempere commented 2 years ago

Both. There is a devs channel on the polymarket Discord on which this is kinda-but-not-really documented. In this case, I did reverse engineer the parameters.

berekuk commented 2 years ago

Seems fine now (cronjob broke today on algolia step due to misconfiguration, sorry; but I see in logs that you've already fixed that).