simonw / til

Today I Learned
https://til.simonwillison.net
Apache License 2.0
1.02k stars 81 forks source link

Related content using embeddings #79

Closed simonw closed 10 months ago

simonw commented 10 months ago

I'm writing a TIL about this as I go.

simonw commented 10 months ago

The first time this ran successfully took this long:

image

If I got the code right it should be a lot faster on subsequent runs, since it will only calculate similarities for stuff related to the latest added entries.

simonw commented 10 months ago

https://til.simonwillison.net/tils now has the new tables:

image

https://til.simonwillison.net/tils/similarities?_facet=id looks good.

simonw commented 10 months ago

Here's the new SQL query: https://til.simonwillison.net/tils?sql=select%0D%0A++til.topic%2C+til.slug%2C+til.title%2C+til.created%0D%0Afrom+til%0D%0Ajoin+similarities+on+til.path+%3D+similarities.other_id%0D%0Awhere+similarities.id+%3D+%27python_pyproject.md%27%0D%0Aorder+by+similarities.score+desc+limit+10

select
  til.topic, til.slug, til.title, til.created
from til
join similarities on til.path = similarities.other_id
where similarities.id = 'python_pyproject.md'
order by similarities.score desc limit 10
simonw commented 10 months ago

Didn't quite work:

Skipped 446 rows that already existed
[{"id": "svg_dynamic-line-chart.md", "other_id": "observable-plot_wider-tooltip-areas.md", "score": 0.7923009914460658}]
error: Could not access 'HEAD~10'
Error: Must specify entries or --all
simonw commented 10 months ago

OK, the second time that ran it took 8s, which is as hoped for:

image
simonw commented 10 months ago

Here's the TIL with the full write-up: https://til.simonwillison.net/llms/openai-embeddings-related-content

simonw commented 10 months ago

The related content for the new article looks good:

Related