Open sefidpardazesh opened 7 years ago
Entities are there only for cases when updating messages (that are either html formated or use markdown) so it can be reformatted properly.
There is no such thing for caption, you will have to write a regex for this...
thanks. what is reges for mention and text_mention?
Entities are there only for cases when updating messages (that are either html formated or use markdown) so it can be reformatted properly.
@jacklul I'm trying to reformat an edited message, but without success. How can I use the entities to properly reformat?
@KilluaFein proof of concept:
private function parseEntitiesString($text, $entities)
{
$global_incr = 0;
foreach ($entities as $entity) {
if ($entity->getType() == 'italic') {
$start = $global_incr + $entity->getOffset();
$end = 1 + $start + $entity->getLength();
$text = $this->mb_substr_replace($text, '_', $start, 0);
$text = $this->mb_substr_replace($text, '_', $end, 0);
$global_incr = $global_incr + 2;
} elseif ($entity->getType() == 'bold') {
$start = $global_incr + $entity->getOffset();
$end = 1 + $start + $entity->getLength();
$text = $this->mb_substr_replace($text, '*', $start, 0);
$text = $this->mb_substr_replace($text, '*', $end, 0);
$global_incr = $global_incr + 2;
} elseif ($entity->getType() == 'code') {
$start = $global_incr + $entity->getOffset();
$end = 1 + $start + $entity->getLength();
$text = $this->mb_substr_replace($text, '`', $start, 0);
$text = $this->mb_substr_replace($text, '`', $end, 0);
$global_incr = $global_incr + 2;
} elseif ($entity->getType() == 'pre') {
$start = $global_incr + $entity->getOffset();
$end = 3 + $start + $entity->getLength();
$text = $this->mb_substr_replace($text, '```', $start, 0);
$text = $this->mb_substr_replace($text, '```', $end, 0);
$global_incr = $global_incr + 6;
} elseif ($entity->getType() == 'text_link') {
$start = $global_incr + $entity->getOffset();
$end = 1 + $start + $entity->getLength();
$url = '(' . $entity->getUrl() . ')';
$text = $this->mb_substr_replace($text, '[', $start, 0);
$text = $this->mb_substr_replace($text, ']' . $url, $end, 0);
$global_incr = $global_incr + 2 + mb_strlen($url);
} elseif ($entity->getType() == 'code') {
$start = $global_incr + $entity->getOffset();
$text = mb_substr($text, 0, $start);
}
}
return $text;
}
Never managed to make it work for 100% cases. Multibyte characters break offsets.
Multibyte characters break offsets.
Like emoji, right?
and what is mb_substr_replace()?
offset and length are UTF-16 encoded, maybe a way to convert to UTF-8 to solve this?
mb_XXX
functions are for multi-byte strings (mb
I guess).
It took me a lot of time thinking on this and I NEVER found a solution to properly get it to work.
public static function processEntities (string $_text, array $_message_raw): string
{
$preset = [
'bold' => '<b>%text</b>',
'italic' => '<i>%text</i>',
'text_link' => '<a href="%url">%text</a>',
'code' => '<code>%text</code>',
'pre' => '<pre>%text</pre>',
];
if (!isset ($_message_raw['entities']))
{
return $_text;
}
$iterationText = $_text;
$globalDiff = 0;
foreach ($_message_raw['entities'] as $entity)
{
$type = $entity['type'];
$offset = $entity['offset'] + $globalDiff;
$length = $entity['length'];
$pBefore = \mb_substr ($iterationText, 0, $offset);
$pText = \mb_substr ($iterationText, $offset, $length);
$pAfter = \mb_substr ($iterationText, ($offset + $length));
// Note: str_replace() works good with utf-8 in the last php versions.
if (isset ($preset[$type]))
{
// Get pattern from the preset.
$replacedContent = $preset[$type];
// First, replace url, in that rare case, if in the text will be the %text macros.
if (!empty ($entity['url']))
{
$replacedContent = \str_replace ('%url', $entity['url'], $replacedContent);
}
// Replace main text.
$replacedContent = \str_replace ('%text', $pText, $replacedContent);
$newText = $pBefore . $replacedContent . $pAfter;
$globalDiff += (\mb_strlen ($newText) - \mb_strlen ($iterationText));
$iterationText = $newText;
}
}
return $iterationText;
}
@jacklul what is actually a problem? And how to reproduce?
I believe the point of this issue is to have a way to edit and reformat messags using entities field, because these do not contain formating we have to use 'entities' field for that, I never managed to create a function that could parse this and put into message string correctly because of multibyte strings...
One of simpliest examples would be button under a message that removes or add text to the message while keeping message contents (and that content cannot be obtained/generated in any other way than grabbing it from Message object).
Any news on this issue? Emojis + text formatting using entities info (offset, length)
I have a working version (I think), needs some further testing and then I'll release it :+1:
My latest experiment, which I'll pack into a small package when it works 100%.
Try the class below, and use it like:
$entity_decoder = new EntityDecoder($message, 'markdown'); // or 'html'
$decoded_text = $entity_decoder->decode();
<?php
use Longman\TelegramBot\Entities\Message;
use Longman\TelegramBot\Entities\MessageEntity;
class EntityDecoder
{
private $entities;
private $text;
private $style;
private $without_cmd;
private $offset_correction;
/**
* @param Message $message Message object to reconstruct Entities from.
* @param string $style Either 'html' or 'markdown'.
* @param bool $without_cmd If the bot command should be included or not.
*/
public function __construct(Message $message, string $style = 'html', bool $without_cmd = false)
{
$this->entities = $message->getEntities();
$this->text = $message->getText($without_cmd);
$this->style = $style;
$this->without_cmd = $without_cmd;
}
public function decode(): string
{
if (empty($this->entities)) {
return $this->text;
}
$this->fixBotCommandEntity();
// Reverse entities and start replacing bits from the back, to preserve offset positions.
foreach (array_reverse($this->entities) as $entity) {
$this->text = $this->decodeEntity($entity, $this->text);
}
return $this->text;
}
protected function fixBotCommandEntity(): void
{
// First entity would be the bot command, remove if necessary.
$first_entity = reset($this->entities);
if ($this->without_cmd && $first_entity->getType() === 'bot_command') {
$this->offset_correction = ($first_entity->getLength() + 1);
array_shift($this->entities);
}
}
/**
* @param MessageEntity $entity
*
* @return array
*/
protected function getOffsetAndLength(MessageEntity $entity): array
{
static $text_byte_counts;
if (!$text_byte_counts) {
// https://www.php.net/manual/en/function.str-split.php#115703
$str_split_unicode = preg_split('/(.)/us', $this->text, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
// Generate an array of UTF-16 encoded string lengths, which is necessary
// to correct the offset and length values of special characters, like Emojis.
$text_byte_counts = array_map(function ($char) {
return strlen(mb_convert_encoding($char, 'UTF-16', 'UTF-8')) / 2;
}, $str_split_unicode);
}
$offset = $entity->getOffset() - $this->offset_correction;
$length = $entity->getLength();
$offset += $offset - array_sum(array_slice($text_byte_counts, 0, $offset));
$length += $length - array_sum(array_slice($text_byte_counts, $offset, $length));
return [$offset, $length];
}
/**
* @param string $style
* @param string $type
*
* @return string
*/
protected function getFiller(string $style, string $type): string
{
$fillers = [
'html' => [
'text_mention' => '<a href="tg://user?id=%2$s">%1$s</a>',
'text_link' => '<a href="%2$s">%1$s</a>',
'bold' => '<b>%s</b>',
'italic' => '<i>%s</i>',
'code' => '<code>%s</code>',
'pre' => '<pre>%s</pre>',
],
'markdown' => [
'text_mention' => '[%1$s](tg://user?id=%2$s)',
'text_link' => '[%1$s](%2$s)',
'bold' => '*%s*',
'italic' => '_%s_',
'code' => '`%s`',
'pre' => '```%s```',
],
];
return $fillers[$style][$type] ?? '';
}
/**
* Decode an entity into the passed string.
*
* @param MessageEntity $entity
* @param string $text
*
* @return string
*/
private function decodeEntity(MessageEntity $entity, string $text): string
{
[$offset, $length] = $this->getOffsetAndLength($entity);
$text_bit = $this->getTextBit($entity, $offset, $length);
// Replace text bit.
return mb_substr($text, 0, $offset) . $text_bit . mb_substr($text, $offset + $length);
}
/**
* @param MessageEntity $entity
* @param int $offset
* @param int $length
*
* @return false|string
*/
private function getTextBit(MessageEntity $entity, $offset, $length)
{
$type = $entity->getType();
$filler = $this->getFiller($this->style, $type);
$text_bit = mb_substr($this->text, $offset, $length);
switch ($type) {
case 'text_mention':
$text_bit = sprintf($filler, $text_bit, $entity->getUser()->getId());
break;
case 'text_link':
$text_bit = sprintf($filler, $text_bit, $entity->getUrl());
break;
case 'bold':
case 'italic':
case 'code':
case 'pre':
$text_bit = sprintf($filler, $text_bit);
break;
default:
break;
}
return $text_bit;
}
}
My latest experiment, which I'll pack into a small package when it works 100%.
Tested and do not see problems. A lot of emojis and different formatting works ok at the first glance.
All code snippets in this thread utterly fail on underline text inside spoilers. (HTML mode)
UPD: Use https://packagist.org/packages/lucadevelop/telegram-entities-decoder
in bot telegram api For text messages we have entity type for detect url, mention, text_mention. But! For photo,video with caption how we detect url,mention.? In other hand how can we use entity type in caption of photo,video?