stanford-oval / WikiChat

WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a corpus.
https://wikichat.genie.stanford.edu
Apache License 2.0
1.05k stars 97 forks source link

More languages? (Catallan) #35

Open jaumet opened 1 month ago

jaumet commented 1 month ago

Hi there! I'm fascinated about wikichat. Maybe you know it, Catalan, my language, used to have an overrated size as a wikipedia according to number of speakers, in relation to the rest of top languages. Yes, Catalans like languages :-)

Is there a how-to on how to implement a new language? I'd voluntary to , add Catalan. And many other languages will come u, reflecting wikipediam anf ree software way.

It is existing already? Would it be that possible?

Many thanks Jaume

s-jse commented 1 month ago

Hi,

Thank you for your interest in WikiChat. Support for a new language in WikiChat comes from two sources:

  1. The multilingual capability of the underlying large language model (e.g. GPT-4o). This enables WikiChat to understand different languages and respond to users in the language they prefer.

  2. Access to Wikipedia data in a new language, for example data from https://ca.wikipedia.org/. This requires "indexing" the Wikipedia for the new language, and adding that to the index that WikiChat is already using. But this needs to be done by us, because it needs to be hosted on our server for everyone to access it.

The first one should already be supported for Catalan. You can try it by asking WikiChat something in Catalan. Let us know if you run into any issues with this. I might get around to adding the second one in the future.