Closed VityaSchel closed 2 years ago
Closing because I found a solution: replace emojis before parsing with any regular three chars, then send to telegram this text but with replaced three chars to emoji:
const originalText = 'Hello&!M<b>world</b>'
function convertHTMLToEntities(root, element = root) {
let entities = []
for(let child of element.childNodes) {
if(child.constructor.name === 'HTMLElement') {
const difference = start => {
const htmlBeforeStart = root.outerHTML.substring(0, start+1)
return htmlBeforeStart.length - stripHtml(htmlBeforeStart).result.length
}
entities.push({
_: entitiesMapping[child.rawTagName],
offset: child.range[0] - difference(child.range[0]),
length: child.innerText.length,
...(child.rawTagName === 'a' && { url: child.getAttribute('href') })
})
if(child.childNodes) entities.push(...convertHTMLToEntities(root, child))
}
}
return entities
}
const parsedText = parse(text)
const entities = convertHTMLToEntities(parsedText)
const textToSend = parsedText.innerText.replaceAll('&!M', '🌚')
Hello, I'm using this parser for Telegram MTProto entity tags and Telegram requires offset and length for each entity tag (such as boldness or underscore text), however emojis are counted as two chars in Telegram because I suppose they count code units instead of actual chars. How can I count emojis as two chars?