parolteknologio / EsperantoGPT

Esperanto language expert and instructor for ChatGPT and other systems
https://chat.openai.com/g/g-D4jB3Ml4b-esperanto-helpanto
26 stars 3 forks source link

Search Reta Vortaro offline xdxf file using code interpreter #1

Open stefangrotz opened 10 months ago

stefangrotz commented 10 months ago

You can search through the revo.xdxf file using code interpreter:

chatGPT-revo

Used prompt

I used this prompt with a minified version of an example python code and some information about the data structure:

 # reta vortaro - revo.xdxf
When asked about the reta vortaro, revo or generally about complex multi-lingual dictionary questions, you can search revo.xdxf using python. The Esperanto words can be found in the <ar> elements, examples in <ex> and translations to other languages in <dtrn>. For non-Esperanto word searches always search in dtrn and return the corresponding Esperanto word and example if not specified differently. Here is an example of the data structure of translations: <dtrn> /de/ Beispiel, Muster, Vorbild</dtrn> There can be additional elements inside of the elements described above. Write robust code that can handle messy XML.

Here is an example of such a search for an Esperanto word:

import xml.etree.ElementTree as D
def A(file_path,word):
    C='def';E=D.parse(file_path);F=E.getroot()
    for A in F.findall('.//ar'):
        B=A.find('k')
        if B is not None and B.text.strip()==word:G=A.find(C).text if A.find(C)is not None else'No definition found';H=[A.text for A in A.findall('.//def/ex')]or['No examples found'];I=[A.text for A in A.findall('dtrn')]or['No translations found'];return{'word':word,'definition':G,'examples':H,'translations':I}
B='revo.xdxf'
C=A(B,'krokodili')
print(C)

Conclusion

IMO right now it is too slow to include it into EsperantoGPT. It takes almost one minute to look up a word. I tried to make it use minfied code to speed things up, but this hasn't worked until now.

stefangrotz commented 10 months ago

Also discussed in the revo repo: https://github.com/revuloj/revo-fonto/issues/61

stefangrotz commented 10 months ago

I managed to get a result after 20 seconds, still pretty slow.

Todo:

stefangrotz commented 9 months ago

The sql file also works (45 seconds). This is my prompt for it:

Reta Vortaro - revo-inx.db

Use this file to search for Esperanto words and their translations.

  1. Database Structure:

    • nodo: Contains main entries (Esperanto words). Key columns are mrk (unique marker) and kap (the word).
    • traduko: Holds translations. Key columns are mrk (linking to nodo) and lng (language code of the translation).
    • var: Stores variations of words. Key columns are mrk and var (variation).
    • Other tables like referenco, uzo, malong, bildo, artikolo, vortspeco, agordo may contain additional information but may not always have relevant data.
  2. Finding Words:

    • To locate an Esperanto word, query the nodo table using the kap column.
    • Example SQL: SELECT mrk FROM nodo WHERE kap = 'desired_word';
  3. Getting Translations:

    • Once you have the mrk from nodo, use it to find translations in the traduko table.
    • Example SQL: SELECT txt FROM traduko WHERE mrk = 'obtained_mrk' AND lng = 'language_code';
  4. Word Variations:

    • To find variations of a word, use the mrk in the var table.
    • Example SQL: SELECT var FROM var WHERE mrk = 'obtained_mrk';
  5. Additional Information:

    • For more details like usage or references, use the mrk to query tables like uzo or referenco.