Closed azdanov closed 3 weeks ago
Here's what I found with mixed languages (Estonian, English, Russian, Ukrainian):
I should have said we need 20 sources in the same language. Mixing languages makes grouping articles harder and the result will be subpar. I have merged it in but it will not be as good as other categories.
I thought that LLMs could work with mixed content. I can remove all other languages apart from Estonian to increase the quality, would that be better?
We are not running everything through an LLM, that would not work at the scale of articles we need to process.
I'd like to start gathering news sources for Estonia.
But it's a bit difficult since 20 is quite a high number for a smaller country. Will see how many sources I can find that are in English.