ssc-oscar / lookup

A mirror of bitbucket.org/swcs/lookup
1 stars 4 forks source link

bb2cf version V: TCH Key is stored in encoded hex strings, should be bytes #40

Open hrz6976 opened 2 months ago

hrz6976 commented 2 months ago

When I was developing the new python driver, I was surprised to find that bb2cf never worked for me. getValues won't work given any blob sha as well:

(base) ~ echo 125afaaefe99189eb2cec5aafc470770b79abbd0 | ~/lookup/getValues bb2cf
No 125afaaefe99189eb2cec5aafc470770b79abbd0 in /fast/bb2cfFullV
(base) ~ echo 125afaaefe99189eb2cec5aafc470770b79abbd0 | ~/lookup/getValues obb2cf
125afaaefe99189eb2cec5aafc470770b79abbd0;5664f2622743898c5ca094670f16b4fb8fb4b74f;4071672afedd71d8997f427ee8ffec5dd97a3a1c;src/templates/unitedstates/states/louisiana.ejs

Digging into the issue, I found out that the keys in tokyocabinet hashtables are encoded hex strings:

In [8]: from woc.tch import TCHashDB
   ...: db = TCHashDB('/fast/woc_azure/da3-fast/bb2cfFullV.18.tch'.encode(), ro=True)
   ...: for k in db:
   ...:     print(k, k.hex())
   ...:     break
   ...:
b'125afaaefe99189eb2cec5aafc470770b79abbd0' 31323561666161656665393931383965623263656335616166633437303737306237396162626430

Instead of bytes, as the ones of obb2cf:

In [7]: from woc.tch import TCHashDB
   ...: db = TCHashDB('/fast/woc_azure/da3-fast/obb2cfFullV.18.tch'.encode(), ro=True)
   ...: for k in db:
   ...:     print(k, k.hex())
   ...:     break
   ...:
b'\x12|O\x11\xcfS\x19+\xbeC\x93\xb2\x0c\xbd\xaf\xcfW\xe6<q' 127c4f11cf53192bbe4393b20cbdafcf57e63c71

And it works with the following walkaround:

# TODO: remove bb2cf quirk after fixing tch keys
# bb2cf: keys are stored as hex strings in tch db
if map_name == 'bb2cf':
    key = key.hex().encode('ascii')

Is it intended, or you are planning to fix that?

audrism commented 2 months ago
  1. all sha1's are in binary form if they are keys. If they are mixed with strings in values, they may sometimes be in hex
  2. 125afaaefe99189eb2cec5aafc470770b79abbd0 indeed does not have a parent, only a child, hence in obb2cf but not in bb2cf
  3. I am surprised how bb2cf does not have binary keys: for i in {0..31}; do time zcat bb2cfFullV{$i,$((i+32)),$((i+64)),$((i+96))}.s | ~/lookup/h2fbbBinSorted.perl /fast/bb2cfFullV.$i.tch; done
  4. recreating bb2cfFullV
hrz6976 commented 2 months ago

Okay, I'll remove the quirk for bb2cf in python-woc after the fix.

audrism commented 2 months ago

Finished