Closed simonw closed 10 months ago
The first time this ran successfully took this long:
If I got the code right it should be a lot faster on subsequent runs, since it will only calculate similarities for stuff related to the latest added entries.
https://til.simonwillison.net/tils now has the new tables:
https://til.simonwillison.net/tils/similarities?_facet=id looks good.
select
til.topic, til.slug, til.title, til.created
from til
join similarities on til.path = similarities.other_id
where similarities.id = 'python_pyproject.md'
order by similarities.score desc limit 10
Didn't quite work:
Skipped 446 rows that already existed
[{"id": "svg_dynamic-line-chart.md", "other_id": "observable-plot_wider-tooltip-areas.md", "score": 0.7923009914460658}]
error: Could not access 'HEAD~10'
Error: Must specify entries or --all
OK, the second time that ran it took 8s, which is as hoped for:
Here's the TIL with the full write-up: https://til.simonwillison.net/llms/openai-embeddings-related-content
The related content for the new article looks good:
Related
- sqlite Related content with SQLite FTS and a Datasette template function - 2022-07-31
- python Calculating embeddings with gtr-t5-large in Python - 2023-01-31
- datasette Crawling Datasette with Datasette - 2022-02-27
- sqlite Copy tables between SQLite databases - 2023-04-03
- mastodon Export a Mastodon timeline to SQLite - 2022-11-04
- datasette Scraping Reddit and writing data to the Datasette write API - 2023-03-13
- sqlite Comparing two training datasets using sqlite-utils - 2023-05-23
- shot-scraper Social media cards generated with shot-scraper - 2023-04-29
- sphinx Adding Sphinx autodoc to a project, and configuring Read The Docs to build it - 2021-08-10
- sqlite Replicating SQLite with rqlite - 2020-12-28
I'm writing a TIL about this as I go.