mathiasbynens / utf8.js

A robust JavaScript implementation of a UTF-8 encoder/decoder, as defined by the Encoding Standard.
https://git.io/utf8js
MIT License
556 stars 115 forks source link

Invalid continuation byte #30

Open romafederico opened 7 years ago

romafederico commented 7 years ago

macOS, Webstorm 2017.1, Reactjs

I'm receving a utf-8 encoded JSON, converting it to a string and then utf8.decode(str).

At some point I'm getting the error Invalid continuation byte. Is there a way in which I can find the byte that is causing this error? This error appears with some of the users of my DB, not all, and I need to compare them.

Thanks

PitPanda1 commented 7 years ago

I am having the same issue. I'm trying to convert strings like that:

let test1 = 'Östliche'; // must be "Östliche"
let test2 = 'Neuwiesenstraße'; // must be "Neuwiesenstraße"

console.log(utf8.decode(test1));
console.log(utf8.decode(test2));
Error: Invalid continuation byte
    at Error (native)
    at readContinuationByte (I:\dev\importer\node_modules\utf8\utf8.js:131:9)
    at decodeSymbol (I:\dev\importer\node_modules\utf8\utf8.js:160:12)
    at Object.utf8decode [as decode] (I:\dev\importer\node_modules\utf8\utf8.js:206:33)
    at Object.<anonymous> (I:\dev\importer\import.js:18:18)
    at Module._compile (module.js:556:32)
    at Object.Module._extensions..js (module.js:565:10)
    at Module.load (module.js:473:32)
    at tryModuleLoad (module.js:432:12)
    at Function.Module._load (module.js:424:3)
    at Module.runMain (module.js:590:10)
    at run (bootstrap_node.js:394:7)
    at startup (bootstrap_node.js:149:9)
    at bootstrap_node.js:509:3
// german special characters
let test1 = "Ä"; // Ä fails
let test2 = "ä"; // ä passes
let test3 = "Ãœ"; // Ü fails
let test4 = "ü"; // ü passes
let test5 = "Ö"; // Ö fails
let test6 = "ö"; // ö passes
let test7 = "ß"; // ß fails

// other special characters
let test8 = "Á"; // Á passes
let test9 = "á"; // á passes

All lowercases pass the test all uppercases not, except "ß" there is no lower / uppercase in german. Tested some other special characters but they passed the test.

davide-scalzo commented 6 years ago

Similar issue with emojis, anybody has an idea on how to fix it (other than a try / catch cop out?)

wouterdialogic commented 6 years ago

Similar issue, circumventing with a try catch block,

error:


Error: Invalid continuation byte
    at readContinuationByte (C:\Ampps\www\b5_revisited\node_modules\utf8\utf8.js:115:9)
    at decodeSymbol (C:\Ampps\www\b5_revisited\node_modules\utf8\utf8.js:156:12)
    at Object.utf8decode [as decode] (C:\Ampps\www\b5_revisited\node_modules\utf8\utf8.js:190:17)
    at try_to_utf8_decode (C:\Ampps\www\b5_revisited\b5_file_parser.js:104:16)
    at process_file (C:\Ampps\www\b5_revisited\b5_file_parser.js:146:13)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)

this is an example of the input:

Wij de werkgroep “KREKEROCK “ organiseren al een paar jaar tijdens de kerstperiode, omdat deze periode zich ui tstekend leent om eens stil te staan bij al het leed in de wereld, het muziekfestival KREKEROCK.
De opbrengst is steeds integraal voor CADAATAN KORTEMARK.
CADAATAN KORTEMARK houdt zich vooral bezig met het verbeteren van de omstandigheden waarin kinderen in bepaalde schooltjes op de Filip ijnen de lessen volgen. De vereniging is vooral actief in het noorden van het eiland CEBU, meer bepaald in enkele barangay’s van SAN REMIGIO.

AlejaRo commented 4 years ago

Similar issue trying to convert the word "Información". Has anyone fixed this issue? I've been all day trying to solve this but I haven't found the solution :(

mboughaba commented 4 years ago

@AlejaRo

console.log(utf8.encode('Información')); // => Información
console.log(utf8.decode(utf8.encode('Información'))); // => Información

Please show us a snippet I've surrounded it with a try/catch and it seems to work so far

according to the tests, this error is thrown when an invalid sequence is encountered

https://github.com/mathiasbynens/utf8.js/blob/2ce09544b62f2a274dbcd249473c0986e3660849/tests/tests.js#L245

balwinder4264 commented 4 years ago

this code is throwing sam e error :

utf8.decode( 'Simplified Chinese: æˆ‘ä»¬ä¸ºæˆ‘ä»¬åˆ›é€ çš„æ¯æ°ä½œçš„å¥‰çŒ®ç²¾ç¥žå’Œå†³å¿ƒåŠ å‰§æ¯ä¸ªGWT代表的激情。但更比任何其他特质,在我们的机会心脏的决定性特征是GWTç»é”€å•†è¡¥å¿è®¡åˆ’ã€‚æˆ‘ä»¬åˆ›å»ºäº†ä¸€ä¸ªæ¶ˆé™¤äº†ä»»ä½•é™åˆ¶ï¼Œé€Ÿåº¦é¢ ç°¸çš„æˆå‘˜è®¿é—®ä»–ä»¬èµšå–ä½£é‡‘å’Œå¥–é‡‘ä¸–ç•Œä¸Šç¬¬ä¸€ä¸ªè‡ªç”±æµåŠ¨çš„å¯å˜è–ªé…¬è®¡åˆ’ã€‚æˆ‘ä»¬æ¸…æ¥šçš„ç»é”€å•†å‹å¥½çš„è–ªé…¬è®¡åˆ’ï¼Œä½¿GWTä¸šåŠ¡çš„äººæ¥è¯´ï¼Œé‚£é‡Œçš„å¹³å‡å…¼èŒåˆ›ä¸šè€…çœŸæ­£æ‹¥æœ‰ä¸ºè‡ªå·±åˆ›é€ è´¢å¯Œï¼Œå¹¶ä¸Žä»–äººåˆ†äº«çš„æœºä¼šçš„æœºä¼šã€‚æˆ‘ä»¬æ„Ÿåˆ°è‡ªè±ªçš„æ˜¯æˆ‘ä»¬çš„é©å‘½è‡ªç”±æµåŠ¨çš„è–ªé…¬è®¡åˆ’æ¶ˆé™¤äº†ç›´é”€å…¶ä¸­åªæœ‰é¡¶çº§ç»é”€å•†çš„ç²¾è‹±èƒ½å¤Ÿå®žçŽ°è´¢åŠ¡ä¼Ÿå¤§çš„çŽ°çŠ¶ã€‚å…¬å¹³å’Œæ„å›¾æ˜¯æˆ‘ä»¬åšç”Ÿæ„çš„æ–¹å¼èƒŒåŽçš„é©±åŠ¨åŠ›å’ŒåŒºåˆ«ä½¿å¾—GWT公司之间在历史上最好的家庭为基础的和基于互联网的机会。', ),

MattChilders92 commented 4 years ago

+1 Having this issue with the letter "ß" in the string

sebastianDejoy commented 4 years ago

Same here when utf8.decode('账单信息') returning Error: Invalid continuation byte. It should decode to 账单信息 , is the library having issues with code points representated in 3 bytes or more (like chinese and korean)?

paolobertani commented 2 years ago

@romafederico

I'm receving a utf-8 encoded JSON, converting it to a string and then utf8.decode(str).

as you receive the utf-8 encoded JSON and store it into a string you get the string re-encoded in UCS2

decoding as it's utf8 raises an error as expectesd

paolobertani commented 2 years ago

@PitPanda1

I am having the same issue. I'm trying to convert strings like that:


let test1 = 'Östliche'; // must be "Östliche"
let test2 = 'Neuwiesenstraße'; // must be "Neuwiesenstraße"

'Östliche' is not UTF-8