pettarin / penelope

Penelope is a multi-tool for creating, editing and converting dictionaries, especially for eReader devices
MIT License
204 stars 31 forks source link

Problem with keys containing blanks #20

Open hostrogo opened 6 years ago

hostrogo commented 6 years ago

Hi,

Converting a stardict file to Kobo, the keys containing blanks as "Abdul Rahman" do not work. On the Kobo, if I seek abdul I find Abdul Rahman, but if I tap on Abdul Rahman to see the definition I get the message "No definition found".

Here is the entry in my xml file:

<article>

<key>Abdul Rahman</key>

<definition type="h">
<![CDATA[<span style="FONT-FACE: Arial; FONT-SIZE: 18pt; FONT-WEIGHT:bold;"><b>Abdul Rahman</b></span><small> </small><df style="FONT-FAMILY: Times New Roman; FONT-SIZE: 18pt">, Tunku <d>(1903–90)</d><small>, Malayan statesman, first Prime Minister of independent Malaya 1957–63 and of Malaysia 1963–70.</small></df>]]>
</definition>
</article>

Now here is the result in 11.html file after conversion with penelope:

`

Abdul Rahman
Abdul Rahman , Tunku (1903–90), Malayan statesman, first Prime Minister of independent Malaya 1957–63 and of Malaysia 1963–70.

` Oxford mini.zip

hostrogo commented 6 years ago

I answer to myself : found a solution, I have modified the module prefix_kobo.py, line 52. Instead of for character in headword: I put: for character in headword[1:2]:.

There is stil a problem, when the blank is in position 2 of the key, as in a posteriori. In a such case the definition is still not found.

pettarin commented 6 years ago

If you look inside the ZIP file that is a Kobo dictionary, you will see a bunch of files named 11.html, aa.html, ab.html, etc.

For example, the file "ab.html" contains all the keys starting with "ab" (e.g., "abacus", etc.) and their definitions.

Now, to deal properly with keys with spaces in them, one should know where the Kobo dictionary lookup algorithm would look. E.g. "a posteriori" should go into "a1.html", "aa.html", "ab.html", or "11.html" ?

Once you know that, the best place to change is the "is_allowed" function. Right now it allows only letters and digits. Anything else (including spaces) will cause the prefix to be returned as "SPECIAL" instead of (say) "ab".

With your change basically means that you only look at the first character, which I guess is not the correct --- unless if the Kobo software looks into "aa.html" if it does not find "ab.html" when looking for "abacus".

As I wrote here: https://github.com/pettarin/penelope#important-update I no longer maintain Penelope, and now I no longer have a working Kobo device, so unfortunately you need to figure it out yourself...

On 04/05/2018 09:51 AM, hostrogo wrote:

I answer to myself : found in solution, I have modified the module prefix_kobo.py, line 52. Instead of |for character in headword:| I put: |for character in headword[1:2]:|.

There is stil a problem, when the blank is in position 2 of the key, as in /a posteriori/. In a such case the definition is still not found.

hostrogo commented 6 years ago

Thank you for your answer. Finally I've found the solution to my problem. I noticed that in the french native dictionary of the Kobo, the definition a posteriori is not stored in 11.html, but in aa.html, and in that case there is no problem, I can see the definition on the Kobo. So I made the following changes in _prefixkobo.py, to substitute an "a" to a blank if the blank is in second position:

Then I replaced the following : for character in headword: by for character in headword[0:2]:

This second change is necessary to avoid to store the definitions containing unallowed characters after the position 2 (as in Bosnia-Herzegovina) in 11.html, because in that case the definition cannot be seen on the Kobo.

Sorry for the clumsy programming, but I'm a complete newbie in Python!

For me the result is ok, I can see all the definitions on the reader.

It was not necessary to make a substitution if the second character is a - or a . or anything else, because in that case the store in 11.html is not a problem. The problem only occurs with a blank in position 2 of the key.