roc-lang / roc

A fast, friendly, functional language.
https://roc-lang.org
Universal Permissive License v1.0
3.89k stars 288 forks source link

segmentation fault when decoding large Json strings #6874

Open adomurad opened 1 week ago

adomurad commented 1 week ago

When trying to decode a big Str from a json roc ./main.roc just silently dies, and roc build ./main.roc & ./main outputs:

[1]    565240 segmentation fault  ./main

Here is a minimal repro (the string was replaced):

app [main] {
    pf: platform "https://github.com/roc-lang/basic-cli/releases/download/0.11.0/SY4WWMhWQ9NvQgvIthcv15AUeA7rAIJHAHgiaSHGhdY.tar.br",
    json: "https://github.com/lukewilliamboswell/roc-json/releases/download/0.10.0/KbIfTNbxShRX1A1FgXei1SpO5Jn8sgP6HP6PXbi-xyA.tar.br",
}

import pf.Task
import pf.Stdout
import json.Json

main =
    Stdout.line! "start"
    bytes = str |> Str.toUtf8
    Stdout.line! "list"
    { result } = Decode.fromBytesPartial bytes Json.utf8
    base64Str : Result Str _
    base64Str = result
    Stdout.line "end"

str = "\"a really long string - around 175000 characters\""

This example works with ~170 000 character, but not with 175 000 and above.

This is a real world use case - webdriver is returning png screenshots base64 encoded with around 260 000 character for a medium size browser window.

I'm attaching a ready to run example: main.txt

Anton-4 commented 1 week ago

Thanks for the minimal reproduction @adomurad :heart:

I was able to reproduce the issue and have some useful valgrind info for who has time to investigate this:

❯ valgrind ./examples/m175 
==37759== Memcheck, a memory error detector
==37759== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==37759== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==37759== Command: ./examples/m175
==37759== 
start
list
==37759== Stack overflow in thread #1: can't grow stack to 0x1ffe801000
==37759== 
==37759== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==37759==  Access not within mapped region at address 0x1FFE801FF8
==37759== Stack overflow in thread #1: can't grow stack to 0x1ffe801000
==37759==    at 0x13F650: List_sublistLowlevel_d1a1e4356bd9fe6c31754def4c60a14042ade1c6c101618179cfd5d1c73189 (roc_app:0)
==37759==  If you believe this happened as a result of a stack
==37759==  overflow in your program's main thread (unlikely but
==37759==  possible), you can try to increase the size of the
==37759==  main thread stack using the --main-stacksize= flag.
==37759==  The main thread stack size used in this run was 8388608.
==37759== 
==37759== HEAP SUMMARY:
==37759==     in use at exit: 315,984 bytes in 3 blocks
==37759==   total heap usage: 19 allocs, 16 frees, 784,344 bytes allocated
==37759== 
==37759== LEAK SUMMARY:
==37759==    definitely lost: 0 bytes in 0 blocks
==37759==    indirectly lost: 0 bytes in 0 blocks
==37759==      possibly lost: 314,936 bytes in 1 blocks
==37759==    still reachable: 1,048 bytes in 2 blocks
==37759==         suppressed: 0 bytes in 0 blocks
==37759== Rerun with --leak-check=full to see details of leaked memory
==37759== 
==37759== For lists of detected and suppressed errors, rerun with: -s
==37759== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)