python273 / telegraph

Telegraph API wrapper | Telegra.ph
https://pypi.org/project/telegraph/
MIT License
287 stars 43 forks source link

TelegraphException: CONTENT_TOO_BIG #21

Closed mengyyy closed 4 years ago

mengyyy commented 5 years ago

I found the max langth is about 10917 * 6. If i use Chinese character the num will be 10917. But i can put more than 20000 Chinese character in web telegra.ph. Is there some reason for this difference?

from telegraph import Telegraph

telegraph = Telegraph()

telegraph.create_account(short_name="1337")

# this will success

telegraph.create_page('page_title', html_content='<p>{}</p>'.format("从"*10917))
# or 
telegraph.create_page('page_title', html_content='<p>{}</p>'.format("a"*10917 * 6))

# but this will failed 
# TelegraphException: CONTENT_TOO_BIG

telegraph.create_page('page_title', html_content='<p>{}</p>'.format("从"*10918))
# or 
telegraph.create_page('page_title', html_content='<p>{}</p>'.format("a"*10918 * 6))
python273 commented 5 years ago
>>> import telegraph
>>> print(json.dumps(telegraph.utils.html_to_nodes('<p>从从从</p>')))
[{"tag": "p", "children": ["\u4ece\u4ece\u4ece"]}]

I think the problem with converting utf-8 chars to json. You can try to add , ensure_ascii=False for json.dumps calls (https://stackoverflow.com/questions/18337407/saving-utf-8-texts-in-json-dumps-as-utf8-not-as-u-escape-sequence) and check if that works

mengyyy commented 5 years ago
>>> import telegraph
>>> print(json.dumps(telegraph.utils.html_to_nodes('<p>从从从</p>')))
[{"tag": "p", "children": ["\u4ece\u4ece\u4ece"]}]

I think the problem with converting utf-8 chars to json. You can try to add , ensure_ascii=False for json.dumps calls (https://stackoverflow.com/questions/18337407/saving-utf-8-texts-in-json-dumps-as-utf8-not-as-u-escape-sequence) and check if that works

Yes It works for Chinese

# success 
telegraph.create_page('page_title', html_content='<p>{}</p>'.format("从"*21834))
# https://telegra.ph/page-title-05-06-16

# CONTENT_TOO_BIG
telegraph.create_page('page_title', html_content='<p>{}</p>'.format("从"*21835))

# P.S.
# ACCESS_TOKEN_INVALID
telegraph.create_page('page_title', html_content='<p>{}</p>'.format("从"*29107))
mengyyy commented 5 years ago

Maybe add an option ensure_ascii in telegraph.create_page will be better?

python273 commented 5 years ago

I think we can set , ensure_ascii=False by default

mengyyy commented 5 years ago

Yeap~