maTayefi / spellbook-dictionary

Automatically exported from code.google.com/p/spellbook-dictionary
0 stars 0 forks source link

Create convertor for babylon (BGL) dictionaries #47

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Babylon happens to offer a lot of free dictionaries, including many 
bulgarian related - http://www.babylon.com/free-
dictionaries/languages/bulgarian/

The dictionaries are encoded in some BGL format, that's not documented as 
far as I know, but some programs are capable of using babylon dictionaries 
so I guess the format is not that complex. Here is some reference I found - 
http://code.google.com/p/bgl-reverse/w/list - maybe more info is available 
online.

I know you're a fan of such tasks so I hope you'll enjoy this one - it's of 
great importance for the future of the project. I've attached a german-
bulgarian dictionary, which I extracted from their exe file.

Original issue reported on code.google.com by lord...@gmail.com on 11 May 2010 at 2:35

Attachments:

GoogleCodeExporter commented 8 years ago
I've found this program - http://uploads.bizhat.com/file/377004 - it works with 
wine 
and converts bgl dictionaries to txt files, which then you can recode to utf8 
and 
import into spellbook. I'd prefer if you managed to create a direct solution 
that in 
one step imports a bgl dictionary into Spellbook, but this is some alternative 
at 
least.

Original comment by lord...@gmail.com on 11 May 2010 at 2:57

GoogleCodeExporter commented 8 years ago
I've found it too but the decoded doesn't look quite right in UTF-8 misses the 
German
special characters. In any of the Western encodings the chars are okay but the
Bulgarian is scrambled. Any clues how to fix that

Original comment by iivalchev@gmail.com on 11 May 2010 at 4:05

GoogleCodeExporter commented 8 years ago
I guess that the encoding might be cp-1251 - such was the case in the original 
dual 
english-bulgarian dictionary I used for spellbook.

Original comment by lord...@gmail.com on 11 May 2010 at 4:56

GoogleCodeExporter commented 8 years ago
With cp-1251 for ü you get ъ

Original comment by iivalchev@gmail.com on 11 May 2010 at 5:09

GoogleCodeExporter commented 8 years ago
In utf8 the bulgarian text looks ok?

Original comment by lord...@gmail.com on 11 May 2010 at 5:43

GoogleCodeExporter commented 8 years ago
Yes the bulgarian is ok.

Original comment by iivalchev@gmail.com on 11 May 2010 at 5:54

GoogleCodeExporter commented 8 years ago
Then it's probably utf8... Maybe you can write some program that simply 
corrects the 
problematic german characters - they have only several special characters...

Original comment by lord...@gmail.com on 11 May 2010 at 7:04

GoogleCodeExporter commented 8 years ago
Is there any progress?

Original comment by lord...@gmail.com on 13 May 2010 at 5:33

GoogleCodeExporter commented 8 years ago
Well, I am fighting the nasty german chars and some strange Java when using hex
comparison problems. For which  I am guessing come from the fact that bytes and
everything smaller than 32 bits is treated as int by the execution engine.

Original comment by iivalchev@gmail.com on 13 May 2010 at 5:53

GoogleCodeExporter commented 8 years ago
It was actually because bytes are treated as signed... but the real problem is 
that
those chars are coded like Cyrillic and simple substitution won't work. 

Original comment by iivalchev@gmail.com on 13 May 2010 at 6:51

GoogleCodeExporter commented 8 years ago
Any progress?

Original comment by lord...@gmail.com on 18 Jun 2010 at 12:58

GoogleCodeExporter commented 8 years ago
Well, think will be able to get German->Bulgarian text file but didn't have the 
time to work on a real converter.

Original comment by iivalchev@gmail.com on 18 Jun 2010 at 1:01

GoogleCodeExporter commented 8 years ago

Original comment by lord...@gmail.com on 22 Jun 2010 at 9:40

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
That is an excellent news. I've been doing some db changes recently - moving to 
the lastest h2 version and making in easier to import dictionaries. You should 
extend the ImportDialog to make use of the file type setting and do a different 
import for babylon dictionaries(first of course being the german dictionary). 
I'll be showcasing Spellbook on thursday in front of a bunch of people - it 
would be great if you manage to finish this task by then.

Original comment by lord...@gmail.com on 5 Jul 2010 at 3:18

GoogleCodeExporter commented 8 years ago
Found that missed some capital chars ... this is final German->Bulgarian in UTF8

Original comment by iivalchev@gmail.com on 5 Jul 2010 at 5:15

Attachments:

GoogleCodeExporter commented 8 years ago
I'll try the Bulgarian->German dictionary now, hope the same trick to work. 
Then adding the new dic should be easy.

Original comment by iivalchev@gmail.com on 5 Jul 2010 at 5:33

GoogleCodeExporter commented 8 years ago
I couldn't find a Bulgarian->German BGL file in the babylon site, guess they've 
changed the file format to BDC. Got one from data/bg hope it' good enough.

Original comment by iivalchev@gmail.com on 5 Jul 2010 at 6:34

Attachments:

GoogleCodeExporter commented 8 years ago
They haven't changed the format - you simply have to download the *.exe file 
and open it with some archive manager(I use Gnome Archive Manager(file-roller)) 
- the BGL file is contained inside of it.

Original comment by lord...@gmail.com on 5 Jul 2010 at 8:24

GoogleCodeExporter commented 8 years ago
It's actually the same as the one I found.

Original comment by iivalchev@gmail.com on 6 Jul 2010 at 7:39

GoogleCodeExporter commented 8 years ago
Some German-Bulgarian icons needed for the dictionaries.

Original comment by iivalchev@gmail.com on 7 Jul 2010 at 8:30

GoogleCodeExporter commented 8 years ago
Coming right up, fresh off Gimp :-)

Original comment by lord...@gmail.com on 7 Jul 2010 at 9:18

Attachments: