happynear / py-leveldb-windows

A Visual Studio project to build leveldb python wrapper
94 stars 30 forks source link

Is it support unicode paths? #1

Open zedxxx opened 8 years ago

zedxxx commented 8 years ago

Can I create db in path like: c:\tmp\中文-español\?

happynear commented 8 years ago

I am not sure. But if the official leveldb suport, there is no reason this code can't.

zedxxx commented 8 years ago

Unfortunately, official leveldb not support Windows :(

happynear commented 8 years ago

Linux is the same, can the linux version support /usr/xxx/中文目录/?

zedxxx commented 8 years ago

Default encoding in Linux is UTF-8 and this is unicode and there is no problem, but in Window it is Win1251, for example. So, from C code you must make some conversions to support unicode in windows.

When we call this:

leveldb_open(const leveldb_options_t* options, const char* name, char** errptr);

by default, in name we put path to db in ANSI encoding in Windows and in UTF-8 in Linux. And we can't access to path in not-system encoding in Windows. To access such paths in Windows we should put in name UTF-8 too, but Windows port of leveldb must expect this and convert UTF-8 to UTF-16 and call Unicode functions from windows api (CreateFileW instead of CreateFileA).

So, is your port of leveldb work with UTF-8 or default encoding?

happynear commented 8 years ago

I am not quite sure. I am busy with a conference deadline now. You may check it by yourself.

zedxxx commented 8 years ago

Can you give me precompiled *.pyd for x86 Python?

happynear commented 8 years ago

I don't have x86 python. I have update the Win32 configuration of the project. You can compile it by yourself.

zedxxx commented 8 years ago

I can install Python x64 for testing in this case. Because installing Visual Studio and compile lib from sources is more difficult. So, give me your pyd for x64, please?

happynear commented 8 years ago

You can download the x64 leveldb.pyd at http://pan.baidu.com/s/1pJ1mMnx .

zedxxx commented 8 years ago

Test code:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import codecs
import leveldb

db_path_uni = u'c:\\tmp\\中文-español'

with codecs.open('leveldb_uni_test.txt', 'w', encoding='utf-8') as f:
    f.write(db_path_uni)

db = leveldb.LevelDB(db_path_uni)

db.Put('hello', 'hello world')

print db.Get('hello')

failed with message:

Traceback (most recent call last):
  File "C:\Python27\leveldb_uni_test.py", line 12, in <module>
    db = leveldb.LevelDB(db_path_uni)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-8: ordinal not in range(128)

If I convert unicode to utf-8 and try to open db:

db = leveldb.LevelDB(db_path_uni.encode('utf-8'))

than it works, BUT it create a new directory c:\tmp\дё­ж–‡-espaГ±ol that is not a unicode path, this is path with a garbage text in my windows default encoding - win1251.

In summary, this port is not work with unicode paths :(

What do you think and can you fix it?

happynear commented 8 years ago

I will try to fix this problem after paper deadline 11/6.