Closed berekuk closed 2 years ago
This seems good!
Some comments here:
I dropped duplicate code from api (pg-wrapper, mongo-wrapper), these were somewhat different from their backend versions.
- Might be important! I haven't read the code carefully but assumed that backend version is correct/good enough, please let me know if I'm wrong here.
Yeah, I think these were a bit different; the frontend version had a bit more error checking to ensure that the webpage doesn't break down completely in case of a database error. Should be relatively easy to redo.
I can imagine CORS not being disabled being a problem if someone wants to query metaforecast endpoints from a different domain.
The frontpage pages were partly for optimization reasons, but mainly for debugging the crontab which wasn't working. Seems fine to delete them.
lib/data was not used, iirc
Re:cron, You are right that I would have used something less radical, but I'm ok with you using Github actions and other black magic beyond my comfort zone :)
Also, I think there are some conflicts on metaculus-fetch function; they were throttling the requests. This shouldn't be too hard to fix.
PS: If I merge this into master, the heroku repo in charge of the repo will break. This is not that much of a problem, since I can continue updating the pg database using my local branch for a few days.
Yeah, I think these were a bit different; the frontend version had a bit more error checking to ensure that the webpage doesn't break down completely in case of a database error. Should be relatively easy to redo.
Ok, I'll re-read that code more carefully then. (Though I don't like the current approach to error checking, but that's a different topic which we can discuss later)
The frontpage pages were partly for optimization reasons, but mainly for debugging the crontab which wasn't working. > Seems fine to delete them.
Cool, I'll look into whether I can rewrite it without a caching table (which seems like an overkill, tbh).
Re:cron, You are right that I would have used something less radical, but I'm ok with you using Github actions and other black magic beyond my comfort zone :)
The key question here is whether we want CLI and files for one-shot tasks.
I think both approaches (Netlify & manually managed VPS) are viable, but there are tradeoffs, I'll write them down tomorrow.
Also, I think there are some conflicts on metaculus-fetch function; they were throttling the requests. This shouldn't be too hard to fix.
Oops, I tried to rebase on fresh master but messed up, will fix.
PS: If I merge this into master, the heroku repo in charge of the repo will break. This is not that much of a problem, since I can continue updating the pg database using my local branch for a few days.
Yeah, don't merge it yet! I plan to push further changes to this branch for one or two days until it's ready.
Yeah, I think these were a bit different; the frontend version had a bit more error checking to ensure that the webpage doesn't break down completely in case of a database error. Should be relatively easy to redo.
Hmm. The only bit I managed to find is extra checks in pgGetByIds, but in case of an error (empty ids list) it returns a nonsensical string which probably triggers an exception down the road anyway, so that's inconsequential.
(Btw, I'm pretty sure pgGetByIds's code is vulnerable to SQL injections...)
Ok, here goes, another set of commits.
Env configuration; I'm in the process of setting up a separate netlify deployment to check that everything works properly, and will finish getSecret refactoring in further commits.
I removed getSecret code entirely and documented env configuration in https://github.com/berekuk/metaforecast-backend/blob/monorepo/docs/configuration.md.
If you had the local secrets.json
file, you'll have to replace it with .env
file.
We'll need to make sure all of env variables are set on Heroku & Netlify (the set of required variables is technically not identical, but I think it'll be easier to just push the same set to both services). I think most of vars are already set on Heroku, and none are set on Netlify. Also, NEXT_PUBLIC_SITE_URL
is new and should be set on Netlify too.
New README; old README files are stale and I'll have to rewrite it significantly before we merge this.
Yep, https://github.com/berekuk/metaforecast-backend/blob/monorepo/README.md
Cron setup
As we discussed in #5, we don't actually want to go with Github Actions and Netlify background functions, so we don't really need to change anything.
Minor improvements and changes:
npm run dbshell
now connects to postgresnetlify-cli
to package.json (check out netlify env
for easier configuration)data/
and input/
filesAbout the new frontpage code.
My first set of commits moved frontpage data from json files to latest.frontpage
; today I rewrote all of that to direct SQL just-in-time queries (see https://github.com/berekuk/metaforecast-backend/blob/monorepo/src/backend/frontpage.ts).
To avoid the performance degradation of the front page (ORDER BY RANDOM()
is kinda slow), I spent a few hours refactoring the commonDisplay and a few related files.
The changes are significant; @NunoSempere, you might want to do a code review just to get acquainted with the new version and not be surprised in the future.
The main change is that /
and /capture
now use getStaticProps
with revalidate
option; this speeds up the initial page load significantly:
azrael@azrael:~$ time curl -s 'https://shimmering-bavarois-62a68f.netlify.app/' >/dev/null
real 0m0.029s
user 0m0.020s
sys 0m0.003s
azrael@azrael:~$ time curl -s 'https://metaforecast.org' >/dev/null
real 0m2.030s
user 0m0.015s
sys 0m0.011s
(https://shimmering-bavarois-62a68f.netlify.app/ is my test instance on netlify; it works on top of incomplete database, but it shouldn't matter)
This means that all search requests except for static frontpage data are performed on client side.
I also did a few cleanups:
Final thoughts on merging this branch.
Here's what has to be done for proper deployment:
/out
dir with tsc
, or install ts-node
in production)It might be easier/safer if I did the final deployment, I believe I have all necessary credentials for this, let me know if you're ok with this.
The main change is that / and /capture now use getStaticProps with revalidate option; this speeds up the initial page load significantly
Not sure I like these new changes.
First, curl is just fetching the html, but it's not loading the javascript. So I'm not sure it's measuring what we care about (time until website shows useful things to the user?)
If you test https://metaforecast.org/?query=test vs https://shimmering-bavarois-62a68f.netlify.app/?query=test, the second is much more annoying; there are three react loads between initial page load and results being shown.
Similarly, because you're using staticprops instead of serverside props, the frontpage is also fetched after initial page load, and there is some initial flashing (??).
All in all, the previous approach seems like it was faster, because:
I think that to save the previous approach with the current structure, we could:
Thoughts here? I personally really dislike multiple reloads until content is shown, but this is not necessarily universal.
Excepting the above details with regards to the frontpage, I'm really liking how this is going. I'm going through the code, and it looks like a different, less amateurish and better organized beast.
there is some initial flashing (??).
This can (and should) be fixed, flashing is due to scrollbar shrinking the page, I planned to look into that anyway, should be fixable with better CSS (the same flashing issue exists on the current website).
When loading the frontpage, I was fetching a json, rather than using the really expensive ORDER BY RANDOM call
No, my version doesn't pay ORDER BY RANDOM costs at all! getStaticProps always prepares props in the background, even when revalidation is enabled.
Overall, for the frontpage without search query, the costs in the new version are:
While in the old version the costs are:
And for the version without search query, new version is faster on time-to-first-byte, but slower on fetching the data since it requests algolia only after hydration.
https://pagespeed.web.dev/report?url=https%3A%2F%2Fmetaforecast.org%2F vs https://pagespeed.web.dev/report?url=https%3A%2F%2Fshimmering-bavarois-62a68f.netlify.app%2F are inconclusive: sometimes old version is faster, sometimes the new one.
(I still like that static version shows the shell with top menu much faster, though.)
Overall, I agree that the gains are not clear, so I'll roll it back, but I'd like to think more about the alternatives later.
I have an idea of splitting search into /search
page, then maybe we could have a fast static fully SSR frontpage and slightly slower /search
page... but unfortunately frontpage should be filterable on stars and platforms, and if we serve all filters from /search, then we'd still need to pull frontpage.json
data into /search
, which means we'd still need to cache it in the DB for the optimal performance. This would speed up the default /
page, at least, and so might be worthwhile (static fully prerendered pages are much better than anything else, obviously, it's just that the query gets in the way).
There might also be a solution in the direction of "let's statically render everything on demand" (Next.js is capable of doing that with getStaticPaths
+ fallback=blocking
), but it would require different URL schemas, like https://metaforecast.org/search/stars/2/forecasts/100/display/20/platforms/Foretold,PolyMarket; URL schemas could be hidden by rewrites, though... overall this approach seems too complicated to me. But I'll think about it some more and maybe figure out a better way (later; don't want to spend too much time on this and delay merging this branch).
I personally really dislike multiple reloads until content is shown, but this is not necessarily universal.
In my experience, static app shells (initial half-empty pages, sometimes with gray boxes instead of content) can be annoying, but can also be good and invisible. Reload shouldn't be noticeable to the end user, and when it's not noticeable, there's not much difference between the classic old-school "browser rendered the first half of html" and app shells from cache.
PS: I just noticed that I missed Next.js 12.1 a month ago, turns out streaming SSR has arrived, this is going to be great for performance and possibly enough even without going full static.
I'll roll it back
Done.
"Rebuild database" job is now #20
in CLI (not included in doEverything
), we can put it on Heroku as a separate job.
flashing is due to scrollbar shrinking the page
I fixed this with permanent vertical scrollbar. It's not a perfect solution but still better than jumping content, and all approaches I could find have some downsides.
Some minor refactorings:
web/search/
(anySearchPage.ts with common getServerSideProps code for index and capture, to fix copy-paste, and I moved commonDisplay.tsx there also since it made sense to locate these together).tsx
extensions and changed the config to ["./src/web/**/*.tsx", "./src/pages/**/*.tsx"]
Overall I think we're ready to deploy this in prod, let me know if it's ok if I do it tomorrow!
Hey, I'm still getting the flashing in your test site (though it's possible that you didn't rebuild it). I've made an issue to discuss that in particular: https://github.com/QURIresearch/metaforecast-backend/issues/14
Name | Link |
---|---|
Latest commit | 118495c30c214e8e88d3929a2610837ab9d79637 |
Latest deploy log | https://app.netlify.com/sites/metaforecast/deploys/623f8ec55ae4fc00089a858d |
Deploy Preview | https://deploy-preview-8--metaforecast.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site settings.
Netlify logs:
Your environment variables exceed the 4KB limit imposed by AWS Lambda. Please consider reducing them.
Ugh. This is annoying and it means that I'll have to figure out a way to inject a .env file to Netlify on the build step, instead of just configuring env variables through Netlify.
Merged and deployed!
I set up the frontpage cron job to run every hour and ran both cron jobs by hand, everything seems fine.
The issue with env limit is still important, for now I worked around it by trimming cookies and removing some unused ones (e.g. CSETFORETELL_COOKIE) so that the entire config fits in 3kb, but this won't work in the long run.
I'll file a separate issue on this and continue from there.
PR for #4.
Due to the nature of this PR it wasn't really possible to come up with a clear diff, sorry.
I tried to keep the changes to the minimum whenever possible, but sometimes it wasn't practical, so I opted for documenting everything I changed instead.
Code layout:
src/
to avoid too many root-level dirssrc/backend/
, frontend insrc/web/
; I'd like to refactor this further laterJS/TS setup:
"type":"module"
issues in package.json which made the frontend & backend code not very compatible in the monorepo setupts-node src/backend/index.js
, or withnpm run cli
which wraps the same commandBasic coding style:
Configuration:
.env
and some were ininput/secrets.json
; I believe.env
is enough and I'm moving in that direction, but I haven't reviewed all the code that usesgetSecret
yet.env
(and other files like.env.production
) for env and secrets toosrc/backend/index.js
) that they don't pull.env
vars by default; I set up dotenv for this and imported it insrc/backend/index.js
andsrc/backend/manual/manualDownload.js
but there are probably other one-time CLI scripts which need it tooNew:
POSTGRES_NO_SSL=1
option for local postgres (it's used insrc/backend/database/pg-wrapper.js
)API:
src/pages/api
, as we discussed/api
on the same domain, so it works anyway.New frontpage setup:
latest.frontpage
which just storesfrontpage_full
andfrontpage_sliced
columns/api/cron/update-frontpage
is now responsible for populating the DB, and it can be called from github actions (see: https://vercel.com/docs/concepts/solutions/cron-jobs; this doc is from Vercel but should be identical on Netlify)/api/cron/update-frontpage
so that we don't get DDOSed (will do this before merge if we decide to go with this approach)latest.frontpage
table is append-only and I don't remove old dataCleanups:
lib/data
dir from the frontend (it wasn't used anywhere, please let me know if I shouldn't have done this)Minor fixes:
documentContents
to[]
inmongoRead
(it failed on an empty DB otherwise)pgInitializeScaffolding
in pg-wrapperclassName
props to custom React components; I just fixed those as I wentStuff that's not ready yet:
getSecret
refactoring in further commits.