Closed cburatto closed 4 years ago
The issue occurs with any text, not just THAI, and I might be missing some configuration. So here is an example:
let parser = require('htmljs-parser').createParser(
{
onOpenTag: function(event) {
console.log(event);
}
}
);
parser.parse('This is a test');
In this case, the result will be:
{
type: 'openTag',
tagName: 'This',
tagNameExpression: undefined,
emptyTagName: undefined,
argument: undefined,
params: undefined,
pos: 0,
endPos: 14,
tagNameEndPos: 4,
openTagOnly: false,
selfClosed: false,
concise: true,
attributes: [
{
name: 'is',
value: undefined,
pos: 4,
endPos: 7,
argument: undefined
},
{
name: 'a',
value: undefined,
pos: 7,
endPos: 9,
argument: undefined
},
{
name: 'test',
value: undefined,
pos: 9,
endPos: 14,
argument: undefined
}
],
setParseOptions: [Function]
}
Is there any way I can avoid regular text being parsed this way?
This is because parsing starts in concise
mode by default (see https://markojs.com/docs/concise/#root-level-text)
I believe you can pass { concise: false }
as a parse option to opt out of this.
I have the following string in English
Before performing this bulk merge operation, you must have a recipient group and template in place.<BR>All bulk email communication sent through {1:product name} must meet the requirements defined in the Mass Email Messaging <a href='xxxxx' target='_blank'>Terms of Service</a>.<BR>
and a corresponding THAI translation
ก่อนที่จะดำเนินการในการส่งจดหมายถึงผู้รับหลายคนนี้ คุณต้องมีกลุ่มผู้รับและแม่แบบอยู่แล้ว<BR>การรับส่งอีเมลทั้งหมดในปริมาณมากซึ่งส่งผ่าน {1:product name} ต้องเป็นไปตามข้อกำหนดที่กำหนดไว้ใน<a href='xxxxx' target='_blank'>เงื่อนไขการให้บริการ</a>ด้านการส่งข้อความอีเมลจำนวนมาก<BR>
The parse is OK for the English string, but incorrectly identifies the Thai text as tags. For example:
Is there any known specific configuration to be used for Thai language or other unicode, or any workaround I could use to eliminate this false positive?
Thanks