Closed nmlorg closed 3 years ago
While parsing multipart/form-data, HttpReader uses special handling only for Content-Type "application/x-www-form-urlencoded". All other values are treated as strings and are used as exact values of the corresponding fields.
The entities
field works in the specified way, because it is expected to be a JSON-serialized array as any other Array/Object method parameter.
Thanks! The thing I'm having trouble with is that entities
seems to work both as a field within a JSON-encoded structure and as a JSON-encoded structure itself — I can send both:
Content-Type: application/json
{"chat_id": 1234, "text": "my text", "entities": [{"type": "bold", "offset": 0, "length": 2}]}
which decodes to:
{
'chat_id': 1234,
'entities': [
{
'length': 2,
'offset': 0,
'type': 'bold',
},
],
'text': 'my text',
}
and:
Content-Type: application/json
{"chat_id": 1234, "text": "my text", "entities": "[{\"type\": \"bold\", \"offset\": 0, \"length\": 2}]"}
which decodes to:
{
'chat_id': 1234,
'entities': '[{"type": "bold", "offset": 0, "length": 2}]',
'text': 'my text',
}
I don't see where that second form is being handled in HttpReader (or maybe HttpReader is passing the JSON-encoded string along, and it's being decoded somewhere else?), so I'm not sure if it's specific to entities
, or if maybe any string that begins with a '['
gets double decoded, or if I'm misunderstanding what's going on.
Just checkpointing where I'm at: As best as I can follow so far, HttpReader::next_read
sees Content-Type: application/json
and hands off to HttpReader::parse_json_parameters
, which decodes '"aaa"'
to:
{
'content': 'aaa',
}
and '{"aaa": bbb, "ccc": ddd}'
to:
{
'aaa': 'bbb',
'ccc': 'ddd',
}
(i.e. it doesn't actually decode the values in a top-level object, it stores them as their JSON representation). This leads me to believe the first form would be stored as:
{
'entities': '[{"type": "bold", "offset": 0, "length": 2}]',
}
and the second would be:
{
'entities': r'"[{\"type\": \"bold\", \"offset\": 0, \"length\": 2}]"',
}
Later, telegram-bot-api/telegram-bot-api/Client.cpp
's Client::get_input_message_text(query)
calls Client::get_input_entities(query, "entities")
, which pulls the still-JSON-encoded string out, and passes that into td/tdutils/td/utils/JsonBuilder.h
's json_decode
.
I thought there might be a place where json_decode
called into itself or one of its helpers in a way that double decoded, but it looks like json_decode
calls do_json_decode
, which sees the leading '"'
and passes control through to json_string_decode
.
So for the second form it should be passing a JsonValue
with a string value of '[{"type": "bold", "offset": 0, "length": 2}]'
to Client::get_formatted_text
(as input_entities
). Then, since input_entities.type() != JsonValue::Type::Array
, it should just be ignored. But it's not. So somewhere before that point there's an extra decode that I'm still not seeing.
This is done in HttpReader::parse_json_parameters. Namely, in
auto r_value = [&]() -> Result<MutableSlice> {
if (parser.peek_char() == '"') {
return json_string_decode(parser);
} else {
const int32 DEFAULT_MAX_DEPTH = 100;
auto begin = parser.ptr();
auto result = do_json_skip(parser, DEFAULT_MAX_DEPTH);
if (result.is_ok()) {
return MutableSlice(begin, parser.ptr());
} else {
return result.move_as_error();
}
}
}();
which decodes all JSON strings and keeps any other value as is.
Of course! I must have read through that a dozen times and just been blinded once I realized the second case was passing strings through undecoded.
So I think that means, for my request assembler, the rule is: any value in the top-level object that isn't itself a string (so, including bools, for example) can be JSON-encoded before being being [potentially JSON-encoded again when] added to the request.
That should be very simple. Thanks!
This is a doozy.
Right now, things like
bot.send_photo(chat_id=1234, photo=2345)
end up as effectively:which is sent as:
This alternatively could be:
which would be sent as:
or:
which would be sent as:
all of which work perfectly fine with Telegram.
However, to actually upload a file, only the third form (
files=
using multipart/form-data) works:which would be sent as:
This can be done relatively painlessly—except when you're mixing both complex data types and arbitrary data. For example:
My initial testing seems to show that telegram-bot-api just magically accepts JSON-encoded data for things like
entities
, even withoutContent-Type
set toapplication/json
; and similarly does not accept JSON-encoded data for things liketext
, even withContent-Type
set toapplication/json
. That is:is interpreted as the string
'Hello there'
while:is interpreted as the string
'"Hello there"'
(it is not decoded from JSON). Consistently, both:and:
are interpreted as the structure
[{'type': 'bold', 'offset': 0, 'length': 2}]
(not the string'[{"type": "bo...
).I've tried digging through TBA's code, which seems to just use TDLib's
HttpReader.cpp
, which does not seem to have any logic to magically decodeentities
, values that start with'['
, or anything else that would lead to this behavior.There's relevant discussion somewhat acknowledging this works in tdlib/telegram-bot-api#177, but no explanation of how (or what the rules for an encoder should be—that is, I can't just always JSON-encode values before putting them in
files=
because strings are not decoded, so I need to know if this is a hard-coded list of fields, or anything that starts with'['
, or what).