randy3k / radian

A 21 century R console
MIT License
2k stars 76 forks source link

Error when pasting long code containing multibyte characters into the radian console and executing it #377

Closed eitsupi closed 2 years ago

eitsupi commented 2 years ago

When I paste a long code of multibyte characters into the radian console and try to run it, I get the following error and cannot run it. This does not occur with the default R terminal.

Error: EOF whilst reading MBCS char at line 5
Error: invalid multibyte character in parser at line 1

The following code reproduces this in my environment. (The following strings are randomly generated.)

a <- c("おゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとみぺととびばぶゐぺぴのるぃゅぽぐやじぶみあいさざゔぅまだおゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとゎべぢでみぷゃぜてこごねぼぶづなぺれぶゑげらぷぼだぱんかこべとこるぐゔなへさばろすてへわたなさゐぜぺてぉでてずけゆぼれきゅはむぇだぴづにをぉわえじごふちりっぅぷあぜぷやべろうまぞてすやなげぅずでょぱけんぬくやわもぱぼゑどぱげをらひゕぜこ
おゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとみぺととびばぶゐぺぴのるぃゅぽぐやじぶみあいさざゔぅまだおゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとゎべぢでみぷゃぜてこごねぼぶづなぺれぶゑげらぷぼだぱんかこべとこるぐゔなへさばろすてへわたなさゐぜぺてぉでてずけゆぼれきゅはむぇだぴづにをぉわえじごふちりっぅぷあぜぷやべろうまぞてすやなげぅずでょぱけんぬくやわもぱぼゑどぱげをらひゕぜこ
おゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとみぺととびばぶゐぺぴのるぃゅぽぐやじぶみあいさざゔぅまだおゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとゎべぢでみぷゃぜてこごねぼぶづなぺれぶゑげらぷぼだぱんかこべとこるぐゔなへさばろすてへわたなさゐぜぺてぉでてずけゆぼれきゅはむぇだぴづにをぉわえじごふちりっぅぷあぜぷやべろうまぞてすやなげぅずでょぱけんぬくやわもぱぼゑどぱげをらひゕぜこ
おゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとみぺととびばぶゐぺぴのるぃゅぽぐやじぶみあいさざゔぅまだおゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとゎべぢでみぷゃぜてこごねぼぶづなぺれぶゑげらぷぼだぱんかこべとこるぐゔなへさばろすてへわたなさゐぜぺてぉでてずけゆぼれきゅはむぇだぴづにをぉわえじごふちりっぅぷあぜぷやべろうまぞてすやなげぅずでょぱけんぬくやわもぱぼゑどぱげをらひゕぜこ
おゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとみぺととびばぶゐぺぴのるぃゅぽぐやじぶみあいさざゔぅまだおゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとゎべぢでみぷゃぜてこごねぼぶづなぺれぶゑげらぷぼだぱんかこべとこるぐゔなへさばろすてへわたなさゐぜぺてぉでてずけゆぼれきゅはむぇだぴづにをぉわえじごふちりっぅぷあぜぷやべろうまぞてすやなげぅずでょぱけんぬくやわもぱぼゑどぱげをらひゕぜこ")
randy3k commented 2 years ago

I was able to reproduce the issue.

Actually, base R has a similar bug if the string contains no new line symbols, like this

a <- c("おゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとみぺととびばぶゐぺぴのるぃゅぽぐやじぶみあいさざゔぅまだおゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとゎべぢでみぷゃぜてこごねぼぶづなぺれぶゑげらぷぼだぱんかこべとこるぐゔなへさばろすてへわたなさゐぜぺてぉでてずけゆぼれきゅはむぇだぴづにをぉわえじごふちりっぅぷあぜぷやべろうまぞてすやなげぅずでょぱけんぬくやわもぱぼゑどぱげをらひゕぜこおゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとみぺととびばぶゐぺぴのるぃゅぽぐやじぶみあいさざゔぅまだおゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとゎべぢでみぷゃぜてこごねぼぶづなぺれぶゑげらぷぼだぱんかこべとこるぐゔなへさばろすてへわたなさゐぜぺてぉでてずけゆぼれきゅはむぇだぴづにをぉわえじごふちりっぅぷあぜぷやべろうまぞてすやなげぅずでょぱけんぬくやわもぱぼゑどぱげをらひゕぜこおゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとみぺととびばぶゐぺぴのるぃゅぽぐやじぶみあいさざゔぅまだおゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとゎべぢでみぷゃぜてこごねぼぶづなぺれぶゑげらぷぼだぱんかこべとこるぐゔなへさばろすてへわたなさゐぜぺてぉでてずけゆぼれきゅはむぇだぴづにをぉわえじごふちりっぅぷあぜぷやべろうまぞてすやなげぅずでょぱけんぬくやわもぱぼゑどぱげをらひゕぜこおゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとみぺととびばぶゐぺぴのるぃゅぽぐやじぶみあいさざゔぅまだおゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとゎべぢでみぷゃぜてこごねぼぶづなぺれぶゑげらぷぼだぱんかこべとこるぐゔなへさばろすてへわたなさゐぜぺてぉでてずけゆぼれきゅはむぇだぴづにをぉわえじごふちりっぅぷあぜぷやべろうまぞてすやなげぅずでょぱけんぬくやわもぱぼゑどぱげをらひゕぜこおゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとみぺととびばぶゐぺぴのるぃゅぽぐやじぶみあいさざゔぅまだおゎがごなめかぅさんんぴしもあみぺえぅぞかおもづどいたぼぺぁりうもぞさせめにげさほぅぅもあぶまゆぜゎまやつりゅびけぁむれぴりゃぎぴせぐゑきぢぷぅるおみらへつにすぅょよへょこぺぁひおべいけとゎべぢでみぷゃぜてこごねぼぶづなぺれぶゑげらぷぼだぱんかこべとこるぐゔなへさばろすてへわたなさゐぜぺてぉでてずけゆぼれきゅはむぇだぴづにをぉわえじごふちりっぅぷあぜぷやべろうまぞてすやなげぅずでょぱけんぬくやわもぱぼゑどぱげをらひゕぜこ")

But radian does handle this case correctly. Actually, radian will fail if we change the variable name from a to ab depends on how a character is cut.

I will need to investigate how the new lines break radian.

randy3k commented 2 years ago

I think I understand what it fails. It is basically because we need to break a long text into chunks of exact size of 4094 bytes. However, for multibytes characters, it is obviously problematic because we might cut the character to halves. Base R works around it by breaking text at line breaks.

However, radian doesn't simply work by breaking long text at new lines because radian parses the whole long text at once. I will need to rework on the parsing logic.

Here is a workaround for this situation. To support both base R and radian, one could save the string in a .R file and source the R file.

eitsupi commented 2 years ago

Thank you for digging into this! At the moment I get around this by using the default R console for projects that need to deal with long strings, but not being able to use radian is stressful.

randy3k commented 2 years ago

It is partly due to a bug in rchitect. It will still require more work to make it work in all cases.

randy3k commented 2 years ago

I have created a PR trying to fix this: #379

it would be great if you could help testing it.

eitsupi commented 2 years ago

Thank you for the quick fixing. I tried the latest version (python3 -m pip install git+https://github.com/randy3k/radian.git@refs/pull/379/head) and it seems working fine for me! (on Linux)