I've tried to process the same URL with UTF-8 encoding:
Only URL is provided (received UTF-encoded response)
URL and contentType=text (works nice)
[Major] URL and valid prefetched HTML provided (invalid response)
As a result, I received three totally different formats of response.
How can I fix encoding and other strange symbols and receive a valid response (just like 2nd case).
Browser Version (if a browser bug): Headless Chrome 75
Expected Behavior
The same output for the three cases described below: with the only URL, with prefetched HTML, and with contentType defined.
Current Behavior
For exactly the same URL there is output in three different formats.
Steps to Reproduce
Run Mercury.parse('https://www.unian.ua/politics/10634892-zelenskiy-pidpisav-klyuchoviy-dlya-borotbi-iz-korupciyeyu-ukaz-shchodo-elektronnih-poslug.html', {contentType: 'text'});
Receive valid text. Russian/Ukrainian symbols are displayed correctly.
First problem:
Run with valid prefetched HTML Mercury.parse('https://www.unian.ua/politics/10634892-zelenskiy-pidpisav-klyuchoviy-dlya-borotbi-iz-korupciyeyu-ukaz-shchodo-elektronnih-poslug.html', {html: prefetchedHtml, contentType: 'text'});
Received invalid (encoded?) response. Please see sample below
Second problem:
Run Mercury.parse('https://www.unian.ua/politics/10634892-zelenskiy-pidpisav-klyuchoviy-dlya-borotbi-iz-korupciyeyu-ukaz-shchodo-elektronnih-poslug.html');
Receive invalid text. Russian/Ukrainian symbols are UTF-encoded.
Detailed Description
Received response (ContentType is not defined): I copy only a few symbols as on issue publish the text is automatically decoded and represented in a valid way.
& #x414;& #x43E;& #x43A;& #x443;& #x43C;& #x435;& #x43D;& #x442;
Hi,
I've tried to process the same URL with UTF-8 encoding:
As a result, I received three totally different formats of response. How can I fix encoding and other strange symbols and receive a valid response (just like 2nd case).
URL: https://www.unian.ua/politics/10634892-zelenskiy-pidpisav-klyuchoviy-dlya-borotbi-iz-korupciyeyu-ukaz-shchodo-elektronnih-poslug.html
Expected Behavior
The same output for the three cases described below: with the only URL, with prefetched HTML, and with contentType defined.
Current Behavior
For exactly the same URL there is output in three different formats.
Steps to Reproduce
Mercury.parse('https://www.unian.ua/politics/10634892-zelenskiy-pidpisav-klyuchoviy-dlya-borotbi-iz-korupciyeyu-ukaz-shchodo-elektronnih-poslug.html', {contentType: 'text'});
First problem:
Mercury.parse('https://www.unian.ua/politics/10634892-zelenskiy-pidpisav-klyuchoviy-dlya-borotbi-iz-korupciyeyu-ukaz-shchodo-elektronnih-poslug.html', {html: prefetchedHtml, contentType: 'text'});
Second problem:
Mercury.parse('https://www.unian.ua/politics/10634892-zelenskiy-pidpisav-klyuchoviy-dlya-borotbi-iz-korupciyeyu-ukaz-shchodo-elektronnih-poslug.html');
Detailed Description
Received response (ContentType is not defined): I copy only a few symbols as on issue publish the text is automatically decoded and represented in a valid way. & #x414;& #x43E;& #x43A;& #x443;& #x43C;& #x435;& #x43D;& #x442;
Received response (prefetched HTML):