me-box / zestdb

ZestDB
MIT License
18 stars 3 forks source link

UTF8 in json data #15

Closed Toshbrown closed 6 years ago

Toshbrown commented 6 years ago

When receiving data from twitter utf8 chars are often present in the JSON this cause the store to return a bad request and the data is not saved.

see https://pastebin.com/BLxMMMk2

The above is an example of one tweet that is successfully saved and one that is rejected by the store.

jptmoore commented 6 years ago

Do you have an example I can pass from my test client to trigger this? It is handling things like this for example:

Johns-MacBook-Pro:test john$ _build/default/client.exe --server-key 'vl6wu0A@XP?}Or/&BR#LSxn>A+}L)p44/W[wXL3<' --path '/ts/foo/latest' --mode get {"timestamp":1512063381380,"data":{"event":"☀ ☁ ☂ ☃ ☄ ★ ☆ ☇ ☈ ☉ ☊ ☋ ☌ ☍ ☎ ☏ ☐ ☑ ☒ ☓ ☚ ☛ ☜ ☝ ☞ ☟ ☠ ☡ ☢ ☣ ☤ ☥ ☦ ☧ ☨ ☩ ☪ ☫ ☬ ☭ ☮ ☯ ☰ ☱ ☲ ☳ ☴ ☵ ☶ ☷ ☸ ☹ ☺ ☻ ☼ ☽ ☾ ☿ ♀ ♁ ♂ ♃ ♄ ♅ ♆ ♇ ♈ ♉ ♊ ♋ ♌ ♍ ♎ ♏ ♐ ♑ ♒ ♓ ♔ ♕ ♖ ♗ ♘ ♙ ♚ ♛ ♜ ♝ ♞ ♟ ♠ ♡ ♢ ♣ ♤ ♥ ♦ ♧ ♨ ♩ ♪ ♫ ♬ ♭ ♮ ♯"}}

Toshbrown commented 6 years ago

there is an example in past bin but I can provide some more:

 see file below.
jptmoore commented 6 years ago

Do you have the JSON on its own I can paste into a validator? I can save that to a file and try from the client.

Toshbrown commented 6 years ago

Sure, here is a file full of them, separated by a new line ;-)

badTweets.txt

jptmoore commented 6 years ago

Are these all failing? I tried the first one and that did not give me an error.

Toshbrown commented 6 years ago

Yep they all failed my end.

Here is a new set of logs. The tweets in badTweets are the ones rejected by the store. storeLogs.txt contains all the output from the store with logging enabled.

badTweets.txt storeLogs.txt

Toshbrown commented 6 years ago

Turned out to be an issue with https://github.com/Toshbrown/nodeZestClient calculating the length of utf8 payloads incorrectly. Fix in version 0.0.9.