nongeneric / lsd2dsl

Lingvo dictionaries decompiler
MIT License
84 stars 19 forks source link

Duden questions #15

Closed siarsky closed 3 years ago

siarsky commented 3 years ago

The easiness with which you describe the reverse engineering of Duden format in your blog is breathtaking, even everyone who ever tried a similar project knows that it was a hard piece of work - chapeau!

I am curious if you maybe know also other Duden formats (I am using Mac):

  1. a .dbb file, representing a single "dictionary", which can be added 1 by 1 into the application
  2. the app then import all dictionaries into one huge .nbof file located in /Users/user/Library/Application Support/DudenBibliothek:

-rwxr-xr-x 1 user group 15360 Nov 29 14:19 dbmedia.bdb -rwxr-xr-x 1 user group 3729228 Nov 29 14:19 dudenbib.fi1 -rwxr-xr-x 1 user group 24660476 Nov 29 14:19 dudenbib.fi2 -rwxr-xr-x 1 user group 6363524 Nov 29 14:19 dudenbib.fsa -rwxr-xr-x 1 user group 348931072 Nov 29 14:39 dudenbib.nbof

Is maybe IDX+BOF in your description NBOF and FI1+FI2 maybe FSI ???

Do you believe lsd2dsl can be used for a decompilation? How?

Thanks for your help! siarsky

PS: hexdump duden8.dbb 0000000 2d 1f ef 89 24 43 b7 82 3f b0 8b 9d 2d b1 ff 25 0000010 90 fa d0 6f 14 d7 6d eb 2d f7 3f 82 26 c3 04 c8 0000020 d3 6a 29 4b 3b c1 e2 01 67 2e 8a f0 a3 a2 b7 6b 0000030 71 b6 dc 4e b9 ba 18 a1 9a c3 25 ad 01 cf 30 a4 ...

hexdump dbmedia.bdb 0000000 2d 1f ef 89 24 43 b7 82 3f b0 8b 9d 2d b1 ff 25 0000010 f8 4f 69 9b 17 aa 4e 4b 15 22 3f f7 63 8b bc 5b 0000020 90 07 e5 e5 53 36 e9 e2 b6 55 62 7f 83 b0 7c 67 0000030 71 b6 dc 4e b9 ba 18 a1 9a c3 25 ad 01 cf 30 a4 ...

hexdump dudenbib.fi1 0000000 01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00 0000010 05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00 0000020 09 00 00 00 0b 00 00 00 0c 00 00 00 0d 00 00 00 0000030 0e 00 00 00 0f 00 00 00 10 00 00 00 11 00 00 00 ...

hexdump dudenbib.fi2 0000000 01 00 00 81 02 00 00 81 03 00 00 81 04 00 00 81 0000010 05 00 00 81 06 00 00 81 07 00 00 81 08 00 00 81 0000020 09 00 00 81 0a 00 00 80 09 00 00 01 0b 00 00 81 0000030 ef 04 00 01 0d 00 00 81 0e 00 00 81 0f 00 00 81 ...

hexdump dudenbib.fsa 0000000 42 46 18 00 00 00 00 00 02 02 00 00 61 01 00 00 0000010 69 01 00 00 02 03 00 00 61 04 00 00 73 01 00 00 0000020 01 01 00 00 73 01 00 00 01 01 00 00 65 01 00 00 0000030 01 01 00 00 72 14 00 00 01 01 00 00 72 18 00 00 ...

hexdump dudenbib.nbof 0000000 2d 1f ef 89 24 43 b7 82 3f b0 8b 9d 2d b1 ff 25 0000010 52 0f 59 d3 27 c3 13 34 d1 e4 13 eb cf 2c f8 27 0000020 20 e1 44 6a 7a c3 30 36 fa 7c 13 0a 2b 17 78 35 0000030 71 b6 dc 4e b9 ba 18 a1 9a c3 25 ad 01 cf 30 a4 ...

nongeneric commented 3 years ago

Thank you for the kind words!

Unfortunately, I'm not familiar with the recent versions of the format, so no idea how many changes to the decompiler are required.

I will take a look a bit later. Can't promise anything, but I'm curious too :)

siarsky commented 3 years ago

If you need some test files, let me know a private way how I can share them with you. I am starting Ghidra as well :)

nongeneric commented 3 years ago

Well, it turns out the new dbb format is a an encrypted sqlite database with the following tables:

tabSystem:
  random1
  random2
  random3
  express
  random4
  random5
  random6
tabDudenbibUrls:
  id
  url
tabBookDescription:
  bookid
  available
  desc
  version
  copyright
  baseimage
  additionsid
  homepage
  hasfields
  numarticles
tabGUIBitmaps:
  filename
  image
tabExternFiles:
  filename
  content
tabMap:
  bookid
  id
  numid
  type
tabHtmlText:
  numid
  lemma
  context
  type
  html
tabMetaFachgebiete:
  numid
  fachgebietid
tabFieldsTopLevel:
  bookid
  field
  desc
tabFieldValues:
  bookid
  field
  val
  desc
tabMarkers:
  artid
  bookid
  created
  html
tabTagging:
  artid
  bookid
  created
  tags

nbof is similar, but with some additional tables.

I don't know how I feel about this. The decompiler would need the decryption key, which needs to be extracted from the binary. If I provide a way to do that, it might trigger an arms race with Duden (the key is already slightly obfuscated to prevent grepping).

Given the key problem and the amount of work needed, I think I'll leave it be for now.

siarsky commented 3 years ago

Thx for your help, I understand your point.