ilius / pyglossary

A tool for converting dictionary files aka glossaries. Mainly to help use our offline glossaries in any Open Source dictionary we like on any modern operating system / device.
GNU General Public License v3.0
2.19k stars 238 forks source link

DSL: expand support of [ref] and [url] #499

Closed soshial closed 1 year ago

soshial commented 1 year ago

Other variants of this tag:

Also, we should support brackets (that optional part of the headword):

(пре)вращать(ся)
    [m1]turn[/m]

image

The card should be opened by different combinations: вращать | превращать | вращаться | превращаться

image

Mentioning such gloss should be done omitting the brackets: [ref]вращать[/ref] Articles that have escaped brackets in headword: rounded brackets \( or \) should be referenced as [ref]rounded brackets ( or )[/ref]

ilius commented 1 year ago

Can you upload a glossary for target=?

soshial commented 1 year ago

Sure. Here's an official dictionary by ABBYY about sport: SportRuEn.zip

These links should open not "баскетбол" gloss but "баскетбол (команды судей)" gloss:

команды судей
    [m1][i][trn][com]referee's commands[/com][/trn][/i][/m]
    [m1][trn][ref target="баскетбол (команды судей)"]баскетбол[/ref][/trn][/m]
    [m1][trn][ref target="бокс (команды судей)"]бокс[/ref][/trn][/m]
    [m1][trn][ref target="волейбол (команды судей)"]волейбол[/ref][/trn][/m]
    [m1][trn][ref target="гребной спорт (команды судей)"]гребной спорт[/ref][/trn][/m]
    [m1][trn][ref target="дзюдо (команды судей)"]дзюдо[/ref][/trn][/m]
    [m1][trn][ref target="лёгкая атлетика (команды судей)"]лёгкая атлетика[/ref][/trn][/m]
    [m1][trn][ref target="теннис (команды судей)"]теннис[/ref][/trn][/m]
    [m1][trn][ref target="тяжёлая атлетика (команды судей)"]тяжёлая атлетика[/ref][/trn][/m]
    [m1][trn][ref target="стрельба (команды судей)"]стрельба[/ref][/trn][/m]
    [m1][trn][ref target="фехтование (команды судей)"]фехтование[/ref][/trn][/m]
soshial commented 1 year ago

I saw this commit 74e736482239d, but it didn't fix the problem:

Screenshot 2023-06-28 at 05 38 15
soshial commented 1 year ago

I added a great test file by @yozhic, that has all possible combinations of ref targets and headwords. This is a great dictionary to test that all links are valid.

ilius commented 1 year ago

Forget about [ref dict="..."] for now.

I pushed to branch dsl-tag-attrs. Please test target= for ref / url. Thanks.

soshial commented 1 year ago

It looks pretty great! I was surprised the code needed so much new code -- this was mainly to support attribute parsing?

ilius commented 1 year ago

Yes.

ilius commented 1 year ago

Pushed to master.

ilius commented 1 year ago

Where do you download these dsl files? Are there any in English?

soshial commented 1 year ago

I downloaded an official distribution of ABBYY Lingvo x6 and there were many dictionaries included, that were created by ABBYY themselves. Since it's a Russian business almost all their dictionaries were from/to Russian to/from other language.

soshial commented 1 year ago

Also, I have translated this test file for you: https://github.com/soshial/pyglossary-test/blob/dsl-ref-target/dsl/009-headwords-and-ref.dsl

soshial commented 1 year ago

Where do you download these dsl files? Are there any in English?

Which DSL files? Do you mean where I get those test ones or just general DSL for my personal usage?

ilius commented 1 year ago

Both.

ilius commented 1 year ago

Please test handling of parenthesis and curly brackets in title line.

soshial commented 1 year ago

One thing that should be improved (see updated test file):

  1. is supporting formatting in headwords
  2. preserving {} parts when showing a gloss card
#NAME   "Dictionary: formatted headwords (En-En)"
#INDEX_LANGUAGE   "English"
#CONTENTS_LANGUAGE   "English"

{[c slategray]}{to }{[/c]}tell {[c violet]}smb{[/c]} {[u]}how{[/u]} to do {[c violet]}smth{[/c]} {[sub]subscript[/sub]}
   [m1]1. main meaning[/m]
   [m2]a. first submeaning[/m]
   [m2]b. second submeaning[/m]

test link
   [ref]tell smb how to do smth[/ref]

image

soshial commented 1 year ago

Speaking of rounded brackets ( )

headword with (brackets) should generate 2 headword: headword with brackets and headword with. Your headword include brackets, while they should not.

soshial commented 1 year ago

I currently cannot test via Slob output , because this lib is broken versions temporarily. I can test using HtmlDir, but crosslinks between cards are broken at the moment.

ilius commented 1 year ago

Please try again.

soshial commented 1 year ago

It's great now! Thank you so much!

soshial commented 1 year ago

Would you please add this test and this one? (UPDATED LINKS)

ilius commented 1 year ago

I think we already cover all these cases.

soshial commented 1 year ago

Yes we do, but I specially translated 009-headwords-and-ref.dsl test file from Russian to English for you.

soshial commented 1 year ago

I guess we may now delete the DSL-related branches that are not needed anymore.