Closed zyh3826 closed 2 years ago
change data to:
d = {
1999: [[19990012, '动画片']],
1114: [[11140004, '动画片'], [11140011, '冒险']],
1101: [[11010020, '冒险']]
}
also, get the wrong output
1999->冒险->11010020
1999->动画片->11140004
1114->冒险->11010020
1114->动画片->11140004
1101->冒险->11010020
1101->动画片->11140004
change data to:
d = {
1: [[1, '2'], [3, '4'], [5, '6']],
2: [[3, '4'], [5, '6']],
3: [[5, '6']]
}
get the correct output:
1->2->1
1->4->3
1->6->5
2->4->3
2->6->5
3->6->5
It appears that the pickle output producing \x00
characters is negatively interacting with the mdb.c's use of strlen
to create the length of the index. Using str.encode()
instead of pickle.dumps
produces the expected output.
import lmdb
import pickle
def en(s: str) -> bytes:
# Pickle failse
# ret = pickle.dumps(s)
# Encode works
ret = s.encode()
parts = ret.split(b"\x00")
print(f'en ({s}) -> {ret} | {parts}')
return ret
d = {
'1999': [['19990012', '动画片']],
'1114': [['11140004', '动画片'], ['11140011', '冒险']],
'1101': [['11010020', '冒险']]
}
sub_type_env = lmdb.open('./test_lmdb', map_size=1000000, max_dbs=1000)
for main_type, items in d.items():
db = sub_type_env.open_db(en(main_type))
with sub_type_env.begin(write=True, db=db) as sub_type2id_txn:
for item in items:
tag_id, tag_name = item
key = tag_name.encode()
val = tag_id.encode()
sub_type2id_txn.put(key, val)
print('{} -> {} -> {}'.format(main_type, len(items), sub_type2id_txn.stat()['entries']))
sub_type_env.close()
# iterate
sub_type_env = lmdb.open('./test_lmdb', map_size=1000000, max_dbs=1000)
for main_type, items in d.items():
db = sub_type_env.open_db(en(main_type))
with sub_type_env.begin(write=False, db=db) as sub_type2id_txn:
for i, j in sub_type2id_txn.cursor():
print('{}->{}->{}'.format(main_type, i.decode(), j.decode()))
sub_type_env.close()
Affected Operating Systems
Affected py-lmdb Version
1.3.0/1.0.0
py-lmdb Installation Method
sudo pip install lmdb
Machine "free -m" output
Describe Your Problem
I have some named databases, and some key-values in Chinese, when I insert them into named databases, I find that is not correct, all named databases have the same data, code:
output:
I try another encode method like pickle, but get the same results, what should I do, thanks a lot