Search Reta Vortaro offline xdxf file using code interpreter

stefangrotz commented 10 months ago

You can search through the revo.xdxf file using code interpreter:

chatGPT-revo

Used prompt

I used this prompt with a minified version of an example python code and some information about the data structure:

 # reta vortaro - revo.xdxf
When asked about the reta vortaro, revo or generally about complex multi-lingual dictionary questions, you can search revo.xdxf using python. The Esperanto words can be found in the <ar> elements, examples in <ex> and translations to other languages in <dtrn>. For non-Esperanto word searches always search in dtrn and return the corresponding Esperanto word and example if not specified differently. Here is an example of the data structure of translations: <dtrn> /de/ Beispiel, Muster, Vorbild</dtrn> There can be additional elements inside of the elements described above. Write robust code that can handle messy XML.

Here is an example of such a search for an Esperanto word:

import xml.etree.ElementTree as D
def A(file_path,word):
    C='def';E=D.parse(file_path);F=E.getroot()
    for A in F.findall('.//ar'):
        B=A.find('k')
        if B is not None and B.text.strip()==word:G=A.find(C).text if A.find(C)is not None else'No definition found';H=[A.text for A in A.findall('.//def/ex')]or['No examples found'];I=[A.text for A in A.findall('dtrn')]or['No translations found'];return{'word':word,'definition':G,'examples':H,'translations':I}
B='revo.xdxf'
C=A(B,'krokodili')
print(C)

Conclusion

IMO right now it is too slow to include it into EsperantoGPT. It takes almost one minute to look up a word. I tried to make it use minfied code to speed things up, but this hasn't worked until now.

stefangrotz commented 10 months ago

Also discussed in the revo repo: https://github.com/revuloj/revo-fonto/issues/61

stefangrotz commented 10 months ago

I managed to get a result after 20 seconds, still pretty slow.

Todo:

Improve the prompt to make it more reliable, especially for non-esperanto word searches
experiment with different file formats, might be quicker and more reliable (csv?)

stefangrotz commented 9 months ago

The sql file also works (45 seconds). This is my prompt for it:

Reta Vortaro - revo-inx.db

Use this file to search for Esperanto words and their translations.

Database Structure:

nodo: Contains main entries (Esperanto words). Key columns are mrk (unique marker) and kap (the word).

traduko: Holds translations. Key columns are mrk (linking to nodo) and lng (language code of the translation).

var: Stores variations of words. Key columns are mrk and var (variation).

Other tables like referenco, uzo, malong, bildo, artikolo, vortspeco, agordo may contain additional information but may not always have relevant data.

Finding Words:

To locate an Esperanto word, query the nodo table using the kap column.

Example SQL: SELECT mrk FROM nodo WHERE kap = 'desired_word';

Getting Translations:

Once you have the mrk from nodo, use it to find translations in the traduko table.

Example SQL: SELECT txt FROM traduko WHERE mrk = 'obtained_mrk' AND lng = 'language_code';

Word Variations:

To find variations of a word, use the mrk in the var table.

Example SQL: SELECT var FROM var WHERE mrk = 'obtained_mrk';

Additional Information:

For more details like usage or references, use the mrk to query tables like uzo or referenco.

parolteknologio / EsperantoGPT

Search Reta Vortaro offline xdxf file using code interpreter #1

Used prompt

Conclusion

Reta Vortaro - revo-inx.db