mjs / imapclient

An easy-to-use, Pythonic and complete IMAP client library
https://imapclient.readthedocs.io/
Other
515 stars 85 forks source link

python3 - UTF-8 string in gmail_search is not working? #249

Closed rdbeni0 closed 7 years ago

rdbeni0 commented 7 years ago

python 3.6 :

    proba0 = 'from:(noreply@olx.pl) subject:(Wiadomość do ogłoszenia)' #utf8 string
    proba1 = server.gmail_search(proba0, charset='UTF-8')
    print(proba0.encode(encoding='utf_8'));
    print(proba1); #its aalways empty with utf8 string, but it should contains a few ID`s

output (2 lines of 2x print):

b'from:(noreply@olx.pl) subject:(Wiadomo\xc5\x9b\xc4\x87 do og\xc5\x82oszenia)'
[]

And when i am using pure ascii (without polish letters), then it is finding something. Search query is correct (checked via gmail web).

Any idea how can I use utf8 string in search query?

mjs commented 7 years ago

Can you try 2 things for me please?

  1. Add server.debug = True before the server.gmail_search() call and report the output here.

  2. Wrap proba0 in a list like this: server.gmail_search([proba0], charset='UTF-8'). I see a potential bug when a bare string is passed and that should work around it. If that works I'll make a fix.

rdbeni0 commented 7 years ago

ok, I changed string proba0 into list and add server.debug = True:

    server.debug = True
    proba0 = ['from:(noreply@olx.pl) subject:(Wiadomość do ogłoszenia)']
    proba1 = server.gmail_search([proba0], charset='UTF-8')
    print(proba0.encode(encoding='utf_8'));
    print(proba1);

output :

Traceback (most recent call last):
  File "/home/collector1871/DEV/python/emailCleaner/emailcleanerfunctions.py", line 46, in gmailcleaner
    proba1 = server.gmail_search([proba0], charset='UTF-8')
  File "/usr/lib/python3.6/site-packages/imapclient/imapclient.py", line 741, in gmail_search
    return self._search([b'X-GM-RAW', query], charset)
  File "/usr/lib/python3.6/site-packages/imapclient/imapclient.py", line 747, in _search
    args.extend(_normalise_search_criteria(criteria, charset))
  File "/usr/lib/python3.6/site-packages/imapclient/imapclient.py", line 1270, in _normalise_search_criteria
    return [_handle_one_search_criteria(item, charset) for item in criteria]
  File "/usr/lib/python3.6/site-packages/imapclient/imapclient.py", line 1270, in <listcomp>
    return [_handle_one_search_criteria(item, charset) for item in criteria]
  File "/usr/lib/python3.6/site-packages/imapclient/imapclient.py", line 1278, in _handle_one_search_criteria
    return _maybe_quote(to_bytes(item, charset))
  File "/usr/lib/python3.6/site-packages/imapclient/imapclient.py", line 1291, in _maybe_quote
    out = arg.replace(b'\\', b'\\\\')
AttributeError: 'list' object has no attribute 'replace'
mjs commented 7 years ago

You've wrapped the search string in a list twice! Can you fix and try again?

rdbeni0 commented 7 years ago

by the way:

~ % pip show imapclient
Name: IMAPClient
Version: 1.0.2
Summary: Easy-to-use, Pythonic and complete IMAP client library
Home-page: http://imapclient.freshfoo.com/
Author: Menno Smits
Author-email: menno@freshfoo.com
License: http://en.wikipedia.org/wiki/BSD_licenses
Location: /usr/lib/python3.6/site-packages
Requires: six, backports.ssl, mock, pyopenssl
~ % 

I am not sure what do you mean by "wrapped the search string in a list twice". Now proba0 is variable with utf8 string. Code:

    server.debug = True
    proba0 = 'from:(noreply@olx.pl) subject:(Wiadomość do ogłoszenia)'
    proba1 = server.gmail_search([proba0], charset='UTF-8') #proba0 is inside []
    print(proba0.encode(encoding='utf_8'));
    print(proba1);

Output:

 File "/home/collector1871/DEV/python/emailCleaner/emailcleanerfunctions.py", line 46, in gmailcleaner
    proba1 = server.gmail_search([proba0], charset='UTF-8')
  File "/usr/lib/python3.6/site-packages/imapclient/imapclient.py", line 741, in gmail_search
    return self._search([b'X-GM-RAW', query], charset)
  File "/usr/lib/python3.6/site-packages/imapclient/imapclient.py", line 747, in _search
    args.extend(_normalise_search_criteria(criteria, charset))
  File "/usr/lib/python3.6/site-packages/imapclient/imapclient.py", line 1270, in _normalise_search_criteria
    return [_handle_one_search_criteria(item, charset) for item in criteria]
  File "/usr/lib/python3.6/site-packages/imapclient/imapclient.py", line 1270, in <listcomp>
    return [_handle_one_search_criteria(item, charset) for item in criteria]
  File "/usr/lib/python3.6/site-packages/imapclient/imapclient.py", line 1278, in _handle_one_search_criteria
    return _maybe_quote(to_bytes(item, charset))
  File "/usr/lib/python3.6/site-packages/imapclient/imapclient.py", line 1291, in _maybe_quote
    out = arg.replace(b'\\', b'\\\\')
AttributeError: 'list' object has no attribute 'replace'

And new output with server.debug = True. Now proba0 is wrapped as standard UTF-8 string. code:

   server.debug = True
    proba0 = 'from:(noreply@olx.pl) subject:(Wiadomość do ogłoszenia)'
    proba1 = server.gmail_search(proba0, charset='UTF-8') #without []
    print(proba0.encode(encoding='utf_8'));
    print(proba1);

output:

21:00.551931 > OIBM6 UID SEARCH CHARSET UTF-8 X-GM-RAW {60}\r\n
21:00.853282 < b'+ go ahead'
   (literal) > "from:(noreply@olx.pl) subject:(Wiadomo\xc5\x9b\xc4\x87 do og\xc5\x82oszenia)"
21:01.168769 < b'* SEARCH'
21:01.169109 < b'OIBM6 OK SEARCH completed (Success)'
b'from:(noreply@olx.pl) subject:(Wiadomo\xc5\x9b\xc4\x87 do og\xc5\x82oszenia)'
[]
mjs commented 7 years ago

I can reproduce the problem by sending myself an email with the same subject that you're testing with.

I can also see the cause. UTF-8 strings get sent as IMAP literals but the search expression has already been quoted before the literal is sent. Gmail sometimes fails to match search expressions sent as literals which have double quotes around them.

I'm not quite sure what the right fix is going to be. I'll have a think about it.