marcuswhybrow / ray-peat-rodeo

Ray Peat interviews transcribed & augmented
https://raypeat.rodeo/
7 stars 1 forks source link

Webscraping at build-time not possible with Nix #13

Closed marcuswhybrow closed 1 year ago

marcuswhybrow commented 1 year ago

Ray Peat Rodeo has bespoke markdown tags for citing external URLs and DOIs (unique IDs of scientific papers). Only the URL or the DOI is specified in the markdown tag. At build-time ./lib/citations/parser.go makes an HTTP request to the pertinent URL, scraping the HTML title from the response body to use as the citation's title when displayed on, say, the home page.

Nix works hard to make builds reproducible. It limits build-time communication to explicitly declared known inputs. You can't make arbitrary HTTP requests: :angry: Web scrapping must happen outside of build-time!

Solution: An update-citation-data.go tool that parses all citations (in the same way as main.go does now), but generates data.json which is committed into the source code. modd can run this automatically (non-blocking) every time a source file changes.

marcuswhybrow commented 1 year ago

This issues was opened during the golang implementation. After that I rewrote the whole project :yawning_face: in Rust. And 7733a0d implemented a scraper which writes to a cache that's committed to the source. At deploy time the cache is used, and no network requests are made.