nickoala / telepot

Python framework for Telegram Bot API
MIT License
2.43k stars 474 forks source link

BadHTTPResponse(response.status, text, response) #292

Closed EmadHelmi closed 7 years ago

EmadHelmi commented 7 years ago

I have an inline bot that get a string from user(inline mode) and send back some data. This is my code:

# -*- coding: utf-8 -*-
import sys
import time
import telepot

from telepot.loop import MessageLoop
from telepot.namedtuple import *
from pprint import pprint
from elasticsearch import Elasticsearch
from mongoengine import connect

from model import *
from setting import *

bot = telepot.Bot(TOKEN)
es = Elasticsearch()
connect('poem')

content_type, chat_type, chat_id = None, None, None

def handle(msg):
    global content_type, chat_type, chat_id
    content_type, chat_type, chat_id = telepot.glance(msg)
    pprint(msg)

def on_inline_query(msg):
    query_id, from_id, query_string = telepot.glance(msg, flavor='inline_query')
    print ('Inline Query:', msg)
    response = es.search(
        index="poem",
        body={
            "query": {
                "match": {"text": query_string},
            }
        }
    )
    articles = []
    for index, hit in enumerate(response['hits']['hits']):
        poem = GanjoorPoemModel.objects(id=hit['_source']['poem_id']).first()
        header = u"%s\n%s" % (hit['_source']['poet'], hit['_source']['book'])
        if len(poem.sub_book):
            for sub in poem.sub_book:
                header += u"\n%s" % sub
        header += u"\n----\n"
        link = poem.link
        text = header + poem.text
        if len(text) > 4096:
            temp = poem.text[:(4096-len(header)-len(link)-10)] + "\n" + link
            text = header + temp
        print "A", str(poem.link)
        # text = text.encode('utf-8', 'ignore').decode('utf-8')
        iqra = InlineQueryResultArticle(
            id=str(index) + "_" + hit['_source']['poem_id'],
            title=u"%s-%s" %(hit['_source']['poet'], hit['_source']['book']),
            input_message_content=InputTextMessageContent(
                message_text=text
            ),
            description=hit['_source']['text'],
            thumb_url='https://appreview.ir/wp-content/uploads/com.example.mohammad.books_.png'
        )
        articles.append(iqra)
    bot.answerInlineQuery(query_id, articles, cache_time=5)

def on_chosen_inline_result(msg):
    result_id, from_id, query_string = telepot.glance(msg, flavor='chosen_inline_result')
    print ('Chosen Inline Result:', result_id, from_id, query_string)

MessageLoop(bot, {'inline_query': on_inline_query,
                  'chosen_inline_result': on_chosen_inline_result}).run_as_thread()
print ('Listening ...')

# Keep the program running.
while 1:
    time.sleep(10)

In above code when i send to bot a word (like پروانه) I get this error:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/telepot/loop.py", line 37, in run_forever
    self._handle(msg)
  File "/usr/local/lib/python2.7/dist-packages/telepot/helper.py", line 1031, in route
    return fn(msg, *args, **kwargs)
  File "bot.py", line 63, in on_inline_query
    bot.answerInlineQuery(query_id, articles, cache_time=5)
  File "/usr/local/lib/python2.7/dist-packages/telepot/__init__.py", line 868, in answerInlineQuery
    return self._api_request('answerInlineQuery', _rectify(p))
  File "/usr/local/lib/python2.7/dist-packages/telepot/__init__.py", line 435, in _api_request
    return api.request((self._token, method, params, files), **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/telepot/api.py", line 137, in request
    return _parse(r)
  File "/usr/local/lib/python2.7/dist-packages/telepot/api.py", line 116, in _parse
    raise exception.BadHTTPResponse(response.status, text, response)
BadHTTPResponse: Status 413 - First 500 characters are shown below:

when i change the line bot.answerInlineQuery(query_id, articles, cache_time=5) to bot.answerInlineQuery(query_id, articles[:-4], cache_time=5) this problem does not appear and the bot sends back data. when i use bot.answerInlineQuery(query_id, articles[:-3], cache_time=5) I get the error again. and when i use bot.answerInlineQuery(query_id, articles[6], cache_time=5) (means exactly the new item of the articles) the exception does not raises. means probably this newly added item does not have any problem. where is wrong? is there any timeout? or is there ana limit on the total articles object? all message_text in the items of articles array are less than 4096 character.

Completed

when i send 'a'*4096 as text it works. so i think the problem is with utf-8 characters

Completed some persian characters are more than 1 byte so i convert the string to bytes using encode('utf-8') get 4096 from it but the problem remains. I cute more and more when i cut 2774 the problem solved. what is the problem.??

Completed when i make the articles array to one array, even an item with text length more than 4096 sends me through th bot and extra characters deleted automatically in bot!!!!! but when the articles array has more than 2 items and one of them more than 4096 bytes, it raises the exception.

das7pad commented 7 years ago

is there a limit on the total articles object?

There is a limit of 50 results per query: https://core.telegram.org/bots/api#answerinlinequery

I cut more and more - when i cut [the content down to] 2774 [char] the problem [is] solved. what is the problem.??

The size of an articles' content is limited to 4096 char: https://core.telegram.org/bots/api#inputtextmessagecontent

but when the articles array has more than 2 items and one of them more than 4096 bytes, it raises the exception.

this is expected - see the second link

The issue can be closed.

EmadHelmi commented 7 years ago

@das7pad You dont understand my question.

  1. I send only 10 results for each query that the user type for the bot. No more than 10.
  2. I try to send one result with an inputmessagetextcontent which has 7000 persian character(each persian character may have two or three bytes) the bot sends the result to the user with no error.(Telgram itself cut the messagecontent to 4096 character)
  3. Itry to send 10 result one of them with 4096 charactersvand others less than 4096 chracters then i get that error in my question.
  4. I remove that 4096 characters result and try to send all other 9 results. Results send back to the user with no error. If s contains my persian text, in a given string, len(s) may be 3480 and len(s.encode('utf-8')) may be 6840.
  5. I cut encoded s to less than 4096 (4095) but when i send it with other results(number 3) i get the same error(but length of my string is less than 4096).
  6. I try to cut more from encoded s, cut more and more. And when i cut some characters so the length of encoded s changed to 2774 the error solved and i could send it with other results(number 3)
EmadHelmi commented 7 years ago

It is very weird for me. This is correct code that sends any queried strings from a user to the user:

# -*- coding: utf-8 -*-
import sys
import time
import telepot

from telepot.loop import MessageLoop
from telepot.namedtuple import *
from pprint import pprint
from elasticsearch import Elasticsearch
from mongoengine import connect

from model import *
from setting import *

reload(sys)  
sys.setdefaultencoding('utf8')

bot = telepot.Bot(TOKEN)
es = Elasticsearch()
connect('poem')

content_type, chat_type, chat_id = None, None, None

def handle(msg):
    global content_type, chat_type, chat_id
    content_type, chat_type, chat_id = telepot.glance(msg)
    pprint(msg)

def on_inline_query(msg):
    query_id, from_id, query_string = telepot.glance(msg, flavor='inline_query')
    response = es.search(
        index="poem",
        body={
            "query": {
                "match": {"beyt": query_string},
            }
        }
    )
    articles = []
    for index, hit in enumerate(response['hits']['hits']):
        header = u"%s\n%s" % (hit['_source']['poet'], hit['_source']['book'])
        try:
            for sub in hit['_source']['sub_book']:
                header += u"\n%s" % sub
        except:
            pass
        header += u"\n----\n"
        # link = poem.link
        text = header + hit['_source']['full_text']
        # print len(text), len(text.encode('utf-8'))
        md = False
        if len(text.encode('utf-8')) > 4096:
            # CONTINUE IF LEN MORE THAN 4096
            continue
            md = True
            link = GanjoorPoemModel.objects(id=hit['_source']['poem_id']).first().link
            text = text.encode('utf-8', 'ignore')
            text = header + hit['_source']['beyt']
            text += u"\n" + u"به دلیل طولانی بودن شعر امکان ارسال آن وجود ندارد.\n"
            text += u"[مطالعه ی متن کامل شعر](%s)" % link
        iqra = InlineQueryResultArticle(
            id=str(index) + "_" + hit['_source']['poem_id'],
            title=u"%s-%s" %(hit['_source']['poet'], hit['_source']['book']),
            input_message_content=InputTextMessageContent(
                message_text=text,
                parse_mode="Markdown" if md else "None",
                disable_web_page_preview=True
            ),
            description=hit['_source']['beyt'],
            thumb_url='https://appreview.ir/wp-content/uploads/com.example.mohammad.books_.png'
        )
        articles.append(iqra)
    bot.answerInlineQuery(query_id, articles, cache_time=5)

def on_chosen_inline_result(msg):
    result_id, from_id, query_string = telepot.glance(msg, flavor='chosen_inline_result')
    print ('Chosen Inline Result:', result_id, from_id, query_string)

MessageLoop(bot, {'inline_query': on_inline_query,
                  'chosen_inline_result': on_chosen_inline_result}).run_as_thread()
print ('Listening ...')

# Keep the program running.
while 1:
    time.sleep(10)

This code gets 10 results from elastic search and skips those results which their encoded texts be more than 4096. As I told it works correctly. But when I ask elasticsearch to sends me 15 or 20 results I get this error:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/telepot/loop.py", line 37, in run_forever
    self._handle(msg)
  File "/usr/local/lib/python2.7/dist-packages/telepot/helper.py", line 1031, in route
    return fn(msg, *args, **kwargs)
  File "bot.py", line 77, in on_inline_query
    bot.answerInlineQuery(query_id, articles, cache_time=5)
  File "/usr/local/lib/python2.7/dist-packages/telepot/__init__.py", line 868, in answerInlineQuery
    return self._api_request('answerInlineQuery', _rectify(p))
  File "/usr/local/lib/python2.7/dist-packages/telepot/__init__.py", line 435, in _api_request
    return api.request((self._token, method, params, files), **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/telepot/api.py", line 132, in request
    return _parse(r)
  File "/usr/local/lib/python2.7/dist-packages/telepot/api.py", line 113, in _parse
    raise exception.BadHTTPResponse(response.status, text, response)
BadHTTPResponse: Status 413 - First 500 characters are shown below:

I check and I am sure that when the results are more than Telegram limit, TelegramException Raises. I think this exception is a bug or something is wrong with the Persian language.

das7pad commented 7 years ago

You dont understand my question.

What is your question then? The cause of the raised exception remains the api-limit and if a single article does not meet the size-limit the request fails.

In detail: telepot raises a BadHTTPResponse in case the request got rejected by the Telegram servers (which results in a non-json response).

The servers gave you a hint with the status code 413, which refers to request payload too big.

The server is refusing to process a request because the request payload is larger than the server is willing or able to process.

telepot is just an interface for the Telegram API, it does not alter your initial request. telepot could check each request if it does meet the api-limits, but that would slow every request down and next lead to different limits as they may be changed on the Telegram side at any time.

To your list:

  1. Good, then you do not hit the limit of 50 answer per request.

  2. Telegram itself cut the messagecontent to 4096 character

    This is not a good move of the Telegram servers, the servers should rather reject the request to be consistent with api-limits. Do not rely on it.

  3. expected

  4. The Telegram servers may have their own mechanics to limit the exact size. [1]

  5. see 4.

  6. see 4.

To your second comment:

TelegramException Raises.

TelepotException just refers to an Exception caused by an error on the Telegram side, which is rather caused by a bad request made by the user or caused by an API-change. It shows that the Exception is not raised due to a code-bug.

try:
  <request>
except telepot.exception.TelegramException:
  print('invalid request')

[1] from a few try/error requests with this string به دلیل طولانی بودن شعر امکان ارسال آن وجود ندارد.: a single message supports <4096 unencoded persian char, which equals ~7134 utf-8-encoded char. Inlinearticles support something between 2550 and 2600 persian char. But again: telepot has no influence on this limit and it is up to you to shrink the size.