metauni / metauniOS

The metauni Operating System.
4 stars 0 forks source link

Universal semantic search #3

Open dmurfet opened 1 year ago

dmurfet commented 1 year ago

At the end of 2021 we started switching to metaboards being persistent by default. We now have hundreds of boards of mathematics spread across dozens of pockets. That will expand as we increase the number of seminars and attendees. The lesson from the Internet and Google is that one of the few scalable ways of organising information is search. Currently there is no search within metauni. Things we might want to find:

Most of the content you might want to search is on boards. This can be made available to text search via OCR, but the error rate is high enough (and handwriting and positioning random enough) that exact keyword search is unlikely to be very useful. However, we can do better with "semantic search" based on e.g. the OpenAI embeddings (https://beta.openai.com/docs/guides/embeddings). These embeddings transform board OCR text (and transcripts from replays and seminars) into vectors, using the same tech that underlies GPT3 (the vectors are the "thought vector" that is in GPT3's mind when it thinks about that term, poetically speaking).

The vision would be that everything at metauni is accessible to this kind of search, i.e. universal semantic search.

To enable this we need to do a few things

This data can be then be run through an automated process that generates embedding vectors (running on the GCP VM, it may take time). Semantic search works by taking a query string, generating a vector from it, and then taking dot products with embedding vectors generated from the above process in order to find hits.

dmurfet commented 1 year ago

Features that are enabled by this:

dmurfet commented 1 year ago