insanum / sncli

Simplenote CLI
MIT License
397 stars 34 forks source link

Search using cyrillic(unicode?) characters cause crash #23

Closed kadomcevi closed 8 years ago

kadomcevi commented 8 years ago

How to reproduce: open sncli type command: /фото program crashed Traceback (most recent call last): File "./sncli", line 33, in sncli.main(sys.argv[1:]) File "/home/igor/prj/sncli/sncli/simplenote_cli/sncli.py", line 1200, in main sncli(sync).gui(key) File "/home/igor/prj/sncli/sncli/simplenote_cli/sncli.py", line 1016, in gui self.sncli_loop.run() File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 278, in run self.screen.run_wrapper(self._run) File "/usr/lib64/python2.7/site-packages/urwid/raw_display.py", line 272, in run_wrapper return fn() File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 343, in _run self.event_loop.run() File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 673, in run self._loop() File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 710, in _loop self._watch_files[fd]() File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 394, in _update self.process_input(keys) File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 494, in process_input k = self._topmost_widget.keypress(self.screen_size, k) File "/usr/lib64/python2.7/site-packages/urwid/container.py", line 1583, in keypress key = self.focus.keypress(tsize, key) File "/home/igor/prj/sncli/sncli/simplenote_cli/user_input.py", line 22, in keypress self.callback_func(self.callback_func_args, self.edit_text) File "/home/igor/prj/sncli/sncli/simplenote_cli/sncli.py", line 340, in gui_search_input self.view_titles.update_note_list(search_string, args[0]) File "/home/igor/prj/sncli/sncli/simplenote_cli/view_titles.py", line 23, in update_note_list self.ndb.filter_notes(self.search_string, search_mode) File "/home/igor/prj/sncli/sncli/simplenote_cli/notes_db.py", line 96, in filter_notes self.filter_notes_gstyle(search_string) File "/home/igor/prj/sncli/sncli/simplenote_cli/notes_db.py", line 218, in filter_notes_gstyle self._helper_gstyle_wordmatch(word_pats, n.get('content')): File "/home/igor/prj/sncli/sncli/simplenote_cli/notes_db.py", line 145, in _helper_gstyle_wordmatch if wp in lowercase_content: UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

note nvpy work fine with unicode search...

samuelallan72 commented 8 years ago

The results of my research into this today:

The reason this is happening is because there is inconsistency with how strings are being handled. Python 2 has two types of strings, and both are used here: 'str' and 'unicode'. See https://pythonhosted.org/kitchen/unicode-frustrations.html#frustration-3-inconsistent-treatment-of-output for an explanation.

Examples of fixing the exact error you are getting (although it will have to have fixes all through the code to support unicode characters) - wp is a 'str', lowercase_content is a 'unicode', and the error is because they aren't the same type:

# convert to a unicode type string
wp = unicode(wp.lower(), 'utf8') # case insensitive search
if wp in lowercase_content:
    word_pats_matched += 1

or alternatively

# convert to str type string
wp = wp.lower() # case insensitive search
if wp in lowercase_content.encode('utf-8'):
    word_pats_matched += 1

Note: python 3 changes this so that all strings work with unicode and there is a bytes string type that can be used if needed (for sending/receiving strings over the http for example). Ultimately it would be great to port sncli to python 3. ;)

kadomcevi commented 8 years ago

No - your fix not working at all. first patch: Traceback (most recent call last): File "./sncli", line 33, in sncli.main(sys.argv[1:]) File "/home/igor/prj/sncli/sncli/simplenote_cli/sncli.py", line 1200, in main sncli(sync).gui(key) File "/home/igor/prj/sncli/sncli/simplenote_cli/sncli.py", line 1016, in gui self.sncli_loop.run() File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 278, in run self.screen.run_wrapper(self._run) File "/usr/lib64/python2.7/site-packages/urwid/raw_display.py", line 272, in run_wrapper return fn() File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 343, in _run self.event_loop.run() File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 673, in run self._loop() File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 710, in _loop self._watch_files[fd]() File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 394, in _update self.process_input(keys) File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 494, in process_input k = self._topmost_widget.keypress(self.screen_size, k) File "/usr/lib64/python2.7/site-packages/urwid/container.py", line 1583, in keypress key = self.focus.keypress(tsize, key) File "/home/igor/prj/sncli/sncli/simplenote_cli/user_input.py", line 22, in keypress self.callback_func(self.callback_func_args, self.edit_text) File "/home/igor/prj/sncli/sncli/simplenote_cli/sncli.py", line 340, in gui_search_input self.view_titles.update_note_list(search_string, args[0]) File "/home/igor/prj/sncli/sncli/simplenote_cli/view_titles.py", line 23, in update_note_list self.ndb.filter_notes(self.search_string, search_mode) File "/home/igor/prj/sncli/sncli/simplenote_cli/notes_db.py", line 96, in filter_notes self.filter_notes_gstyle(search_string) File "/home/igor/prj/sncli/sncli/simplenote_cli/notes_db.py", line 220, in filter_notes_gstyle self._helper_gstyle_wordmatch(word_pats, n.get('content')): File "/home/igor/prj/sncli/sncli/simplenote_cli/notes_db.py", line 147, in _helper_gstyle_wordmatch if wp in lowercase_content: UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 634: ordinal not in range(128)

second patch: Traceback (most recent call last): File "./sncli", line 33, in sncli.main(sys.argv[1:]) File "/home/igor/prj/sncli/sncli/simplenote_cli/sncli.py", line 1200, in main sncli(sync).gui(key) File "/home/igor/prj/sncli/sncli/simplenote_cli/sncli.py", line 1016, in gui self.sncli_loop.run() File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 278, in run self.screen.run_wrapper(self._run) File "/usr/lib64/python2.7/site-packages/urwid/raw_display.py", line 272, in run_wrapper return fn() File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 343, in _run self.event_loop.run() File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 673, in run self._loop() File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 710, in _loop self._watch_files[fd]() File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 394, in _update self.process_input(keys) File "/usr/lib64/python2.7/site-packages/urwid/main_loop.py", line 494, in process_input k = self._topmost_widget.keypress(self.screen_size, k) File "/usr/lib64/python2.7/site-packages/urwid/container.py", line 1583, in keypress key = self.focus.keypress(tsize, key) File "/home/igor/prj/sncli/sncli/simplenote_cli/user_input.py", line 22, in keypress self.callback_func(self.callback_func_args, self.edit_text) File "/home/igor/prj/sncli/sncli/simplenote_cli/sncli.py", line 341, in gui_search_input self.gui_body_set(self.view_titles) File "/home/igor/prj/sncli/sncli/simplenote_cli/sncli.py", line 200, in gui_body_set self.gui_update_status_bar() File "/home/igor/prj/sncli/sncli/simplenote_cli/sncli.py", line 281, in gui_update_status_bar self.gui_header_set(self.gui_body_get().get_status_bar()) File "/home/igor/prj/sncli/sncli/simplenote_cli/view_titles.py", line 155, in get_status_bar hdr += ' - Search: ' + self.search_string UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 11: ordinal not in range(128)

I think small fixes no option here. Whole program should be reviewed for utf-8 support. Program looks promising but nvpy still work right.

samuelallan72 commented 8 years ago

I know - it only works for that particular section - it was just an example of how to go about fixing the general problem overall. As I said before, the whole converting to unicode type or 'str' type would have to be done program-wide so every part of it supports unicode characters outside the ascii set. (Either that, or port to python 3 to take advantage of updated functionality in strings.)

I'll have a go at working on a solution this week, time permitting (check for updates on my fork).

In the meantime, yeah nvpy should work ok and sncli should be stable enough if you can get by without using non-ascii characters with it.

samuelallan72 commented 8 years ago

@kadomcevi I've been working on porting to python 3, and thus fixing encoding issues. Try out my python3 branch if you like. I'm able to search for any unicode characters without it crashing now. :smile:

Disclaimer/warning - it seems fairly stable to me, but it is now using a different networking library (requests instead of urllib), and there's a fair amount of changes, so there's potential for new bugs which could result in loss of notes.

I'll keep working on it and testing when I have time to make sure it's going to be as bug-free as possible. When I'm happy with it I'll submit a pull request here if the owners are happy to accept.

Feel free to open issues on my fork for any bugs/issues not present on the original project here!

kadomcevi commented 8 years ago

$ sncli Traceback (most recent call last): File "/usr/bin/sncli", line 30, in from simplenote_cli import sncli File "/usr/lib64/python3.3/site-packages/simplenote_cli/sncli.py", line 44 except Exception, e: ^ SyntaxError: invalid syntax

samuelallan72 commented 8 years ago

@kadomcevi Can you provide more context? I have definitely fixed all syntax errors in the branch/fork I linked to, so whatever is at '/usr/lib64/python3.3/site-packages/simplenote_cli/sncli.py' isn't my modified version.

Also, would you mind taking all further discussion about things specific to my fork over to a new issue on my fork? That would avoid this thread getting quite offtopic and keep discussion related to the issue in question. Thanks.

kadomcevi commented 8 years ago

Sorry, my mistake. It is not your branch:(

casutherland commented 8 years ago

Similar issue encountered when simply opening a note in the viewer. The note contains unicode quotation marks, single quotes, and other punctuation characters.

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 5: ordinal not in range(128)

Traceback: ... File "/usr/lib/python2.7/dist-packages/urwid/canvas.py", line 1291, in apply_text_layout text[s.offs:s.end]) File "/usr/lib/python2.7/dist-packages/urwid/util.py", line 121, in apply_target_encoding s = s.encode( _target_encoding ) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 5: ordinal not in range(128)

samuelallan72 commented 8 years ago

@qcu Sorry for highjacking this thread, but I'm not sure of the development status of this project (no updates in a year). I forked the project a while back, and made some updates to help fix problems with unicode encoding errors - check out my python3 branch of https://github.com/swalladge/sncli, which includes some refactorings and depends on python 3 instead of 2, and hopefully fixes most of the encoding problems.

Also feel free to open issues on my fork if you still have problems, and I'll attempt to maintain the code there and fix bugs. :)

insanum commented 8 years ago

Hi @swalladge. Thank you for chipping away at this issue. Please submit a pull request for your fixes. While I don't use/develop sncli as much as I used to, I'd be more than happy to give you commit access to help maintain this project. Of course... fork all you want as well. :-)

samuelallan72 commented 8 years ago

@insanum Ok thanks, I'll do that right away. I guess I'm keen on fixing bugs as they arise because I use it a lot personally. Would be a lot better keeping development here in the original project, rather than maintaining other forks (and I don't particularly want to highjack your project!). If you would review/accept pull requests, I'll be more than happy to help maintain it and go bug hunting. :)

insanum commented 8 years ago

Giving you keys to the kingdom... :-)

samuelallan72 commented 8 years ago

@kadomcevi, @qcu, please test with the latest changes merged here (taking into account the new dependencies) to see if the problem has been fixed for you. I've been testing it, with no crashing relating to using unicode characters anymore.

kadomcevi commented 8 years ago

@swalladge Thanks, Samuel. Now all work fine as expected :)