Closed pabloarosado closed 2 weeks ago
Quick links (staging server): Site | Admin | Wizard | Docs |
---|
Login: ssh owid@staging-site-app-to-find-similar-insights
Edited: 2024-11-11 09:57:39 UTC Execution time: 15.04 seconds
@lucasrodes could you review it please? I can't install torch on my laptop due to this issue. It's probably solvable, but I've already spent an hour on it and didn't make any progress.
@lucasrodes could you review it please? I can't install torch on my laptop due to this issue. It's probably solvable, but I've already spent an hour on it and didn't make any progress.
Thanks Mojmir, I'm sorry about that issue, it sounds annoying! If you want I can add this app temporarily to wizard, so you can play with it (in any case I'm also happy if Lucas wants to have a look, or both).
Hey @Marigold I've moved it to wizard, so you can try it out. But of course, if this is going to break your ETL environment, we shouldn't push it. I find it very useful, and having that library on ETL could also let us experiment with other similar things, but we can also move it to its own repos if it's problematic (or discard it if others don't find it useful, it's just an experiment). Let me know what you think, thanks.
Create a script that launches a streamlit app to do a semantic search over data insights.
The script loads and parses data insights (from the database), creates an embedding (on my laptop, it takes less than 10 seconds, but ideally this should happen under the hood, and store embeddings in the database), and sorts DIs by semantic similarity with respect to a given input string. For now, this is an experiment. If we decide it's useful, we can integrate it on our wizard.
I think it would be useful to have something like this on our wizard. For authors, it could be useful to find what has already been written about a certain topic. And for data peeps, it can open doors to do other kinds of analytics and experiments with our content.
The downside is that it requires installing some big libraries (transformers and pytorch). The first time it's build it needs to download some models, which are ~100MB. But maybe this can be useful for other similar applications.