planetarium / bencodex

Bencodex: Bencoding Extended
https://bencodex.org/
17 stars 3 forks source link

unicode-string.dat #3

Open goldenMetteyya opened 5 years ago

goldenMetteyya commented 5 years ago

Hello,

Great work here with the extension to bencode. I decided that your extension was great with the needed extras. I am working on a rust implementation. In my testing I found that

u146:秋江에 밤이 드니 물결이 차노매라 낚시 드리치니 고기 아니 무노매라 無心한 달빛만 싣고 빈 배 저어 오노라

is not u146 but u145, have you checked this ?

thanks

dahlia commented 5 years ago
$ wc -c testsuite/unicode-string.dat 
151 testsuite/unicode-string.dat

The data file in itself consists of 151 bytes. The prefix u146: occupies 5 bytes, so the rest of the file indeed are 146 bytes. I guess your implementation might produce a different UTF-8 bytes or you might miss out the last line feed character (U+000A LINE FEED). If you look into the corresponding JSON file (unicode-string.json) the value string ends with a \n character.

goldenMetteyya commented 5 years ago
```shell
$ wc -c testsuite/unicode-string.dat 
151 testsuite/unicode-string.dat

The data file in itself consists of 151 bytes. The prefix u146: occupies 5 bytes, so the rest of the file indeed are 146 bytes. I guess your implementation might produce a different UTF-8 bytes or you might miss out the last line feed character (U+000A LINE FEED). If you look into the corresponding JSON file (unicode-string.json) the value string ends with a \n character.

Hello,

I will check again but it might be the missing newline character then.

thanks

dahlia commented 4 years ago

@goldenMetteyya I am sorry for the late response. How does wc -c testsuite/unicode-string.dat say on your system?