postlight / parser

📜 Extract meaningful content from the chaos of a web page
https://reader.postlight.com
Apache License 2.0
5.37k stars 443 forks source link

feat: Refactor and update fixtures #681

Closed johnholdun closed 1 year ago

johnholdun commented 1 year ago

This patch changes how fixtures are stored. Previously, a fixture's folder identified its domain and its filename identified when it was fetched. This has been changed so that the filename indicates the domain and the modified time of the file indicates how recently it was fetched. A fixture's filename can optionally include a modifier to distinguish between two different page types on the same domain, for example.

Also included here are changes to the update-fixture script, both to accomodate the new filename scheme as well as to actually update all fixtures. The functionality for running automatically and opening PRs has been removed but will likely be reintroduced.

Finally, all fixtures have been updated.

johnholdun commented 1 year ago

Going to let this PR simmer a while while we get all the currently-open changes to extractors merged, to avoid conflicts. Then I'll update this to suit.

sdoire commented 1 year ago

Rebased to main and fixed merge conflicts.

sdoire commented 1 year ago

Closing because all changes will be captured in three separate PRs: #712, #713, and an upcoming one from this branch.