Closed Janiczek closed 2 years ago
Since this in a proprietary repository, I can't just share the project globally. I'm trying to get to the bottom of it though, chopping parts off...
I've got it down to src/Main.elm
:
module Main exposing (main)
import Html
main = Html.text ""
and elm.json
:
{
"type": "application",
"source-directories": [
"src"
],
"elm-version": "0.19.1",
"dependencies": {
"direct": {
"GlobalWebIndex/cmd-extra": "1.4.0",
"elm/browser": "1.0.2",
"elm/core": "1.0.5",
"elm/http": "2.0.0",
"elm/json": "1.1.3",
"elm/svg": "1.0.1",
"elm/time": "1.0.0",
"elm/url": "1.0.0",
"elm-community/html-extra": "3.4.0",
"elm-community/json-extra": "4.3.0",
"elm-community/maybe-extra": "5.3.0",
"krisajenkins/remotedata": "6.0.1",
"rtfeldman/elm-css": "17.0.5",
"rtfeldman/elm-iso8601-date-strings": "1.1.4",
"turboMaCk/any-set": "1.5.0",
"waratuman/time-extra": "1.1.0"
},
"indirect": {
"elm/bytes": "1.0.8",
"elm/file": "1.0.5",
"elm/html": "1.0.0",
"elm/parser": "1.1.0",
"elm/virtual-dom": "1.0.2",
"justinmimbs/timezone-data": "2.1.4",
"robinheghan/murmur3": "1.0.0",
"rtfeldman/elm-hex": "1.0.0",
"turboMaCk/any-dict": "2.6.0"
}
},
"test-dependencies": {
"direct": {},
"indirect": {}
}
}
I'll try to minimize the elm.json now.
The error seems to be linked to justinmimbs/timezone-data
, which is quite big:
https://github.com/justinmimbs/timezone-data/blob/5.1.2/src/TimeZone.elm
which would agree with the error - vector capacity overflow.
My final SSCCE'd elm.json
:
{
"type": "application",
"source-directories": [
"src"
],
"elm-version": "0.19.1",
"dependencies": {
"direct": {
"elm/browser": "1.0.2",
"elm/core": "1.0.5",
"elm/html": "1.0.0",
"justinmimbs/timezone-data": "5.1.2"
},
"indirect": {
"elm/json": "1.1.3",
"elm/time": "1.0.0",
"elm/url": "1.0.0",
"elm/virtual-dom": "1.0.2"
}
},
"test-dependencies": {
"direct": {},
"indirect": {}
}
}
Beautiful bug report, thank you! I'm looking into it now.
@Janiczek on an unrelated point: are you manually compiling to get a native elm-pair binary, and so avoid needing to use Rosetta on your Arm64 Macbook? I'm curious what the install-experience on those newer Macbooks is like since I don't have one myself to test.
I mainly wanted to try out nix+rust together (although I'm unconvinced - debug built fine, but --release
didn't build due to errors I couldn't figure out how to fix).
I'm curious what the install-experience on those newer Macbooks is like
I'll go clone the current version of your repo from scratch and will report here 🙂
It went pretty smooth!
:messages
Download Elm compiler 0.19.1...
Download Elm-pair release-4...
Elm-pair installation complete!
I'm not able to reproduce this one against your example, either on my Linux box or Macbook :cry:. In both cases elm-pair runs and does things, even using values from the justinmimbs/timezone-data
package. Randomly inspecting some potentially guilty vectors didn't unearth anything suspicious either. Memory usage is almost 0 while it is running, and also my Macbook Pro is 5 years old, so in hardware terms I imagine your Arm64-based one is miles ahead of mine anyway :thinking:.
The <unknown>
entries in your stacktrace are weird. I'd expect those in the distributed binaries because those get built with the --release
flag, which I believe omit debug symbols. But you were saying that you were running your own compiled binary built without the --release
flag, right? Does using the distributed binaries result in the same error?
I was actually able to get a backtrace from your official build! This is on 0.4.0:
thread 'main' panicked at 'capacity overflow', library/alloc/src/raw_vec.rs:509:5
stack backtrace:
0: 0x1006d31f1 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h90c059b532d34bdd
1: 0x1006f03ab - core::fmt::write::hefb311138778afb5
2: 0x1006ce5fa - std::io::Write::write_fmt::h0e0ed87b18ae870c
3: 0x1006d4a15 - std::panicking::default_hook::{{closure}}::h43f0e51e5825bd5b
4: 0x1006d45ff - std::panicking::default_hook::ha1c3473cbcf391f7
5: 0x1006d5100 - std::panicking::rust_panic_with_hook::h0db3c4ba5fb4dc12
6: 0x1006d4b74 - std::panicking::begin_panic_handler::{{closure}}::hc6d8c92efeef5a19
7: 0x1006d3667 - std::sys_common::backtrace::__rust_end_short_backtrace::h7ffb59e898e02a83
8: 0x1006d4b0a - _rust_begin_unwind
9: 0x1007131df - core::panicking::panic_fmt::h7d73f67464d78916
10: 0x100713137 - core::panicking::panic::h9721d5a8c5d33ad7
11: 0x1006ea73c - alloc::raw_vec::capacity_overflow::hfb61678c6d58a3b6
12: 0x1005b9515 - elm_pair::elm::io::parse_elm_stuff_idat::parse_elm_stuff_idat::h84d18c990a9301b1
13: 0x1001feafc - <differential_dataflow::operators::reduce::history_replay::HistoryReplayer<V1,V2,T,R1,R2> as differential_dataflow::operators::reduce::PerKeyCompute<V1,V2,T,R1,R2>>::compute::hc7ec5cec657ac7c0
14: 0x100337afe - timely::dataflow::operators::generic::builder_rc::OperatorBuilder<G>::build_reschedule::{{closure}}::hb00178903fcbc135
15: 0x1002a4392 - <timely::dataflow::operators::generic::builder_raw::OperatorCore<T,L> as timely::scheduling::Schedule>::schedule::h4c8c49a19ce22758
16: 0x1003c5f81 - <timely::progress::subgraph::Subgraph<TOuter,TInner> as timely::scheduling::Schedule>::schedule::hb7138226abacd90e
17: 0x1005eb41d - timely::worker::Wrapper::step::hc8b6e6a256e02aca
18: 0x1004db1d9 - timely::worker::Worker<A>::step_while::h37e0230d38fed230
19: 0x1005bfe61 - elm_pair::lib::dataflow::Advancable::advance::h3397c155eb1f3f4d
20: 0x1004678b4 - elm_pair::elm::dependencies::DataflowComputation::advance::ha7c0c3c61e6a49ba
21: 0x100403cf8 - elm_pair::analysis_thread::run::h31ab316fd7f0bd10
22: 0x1003a9e69 - elm_pair::main::h2cff6cf0af74ee3e
23: 0x10026d726 - std::sys_common::backtrace::__rust_begin_short_backtrace::h1fb6c74b1824be1d
24: 0x100204e4c - std::rt::lang_start::{{closure}}::hb73fad24fb9872e0
25: 0x1006d2e9f - std::rt::lang_start_internal::hb74872162e3d56c9
26: 0x1003aa9b9 - _main
Looks like this is the culprit?
12: 0x1005b9515 - elm_pair::elm::io::parse_elm_stuff_idat::parse_elm_stuff_idat::h84d18c990a9301b1
Incidentally my elm-stuff/0.19.1/i.dat is >500kB in size. Perhaps that's hitting some buffer size sweet spot.
In case that's helpful here's my ulimit -a
:
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 256
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 8176
cpu time (seconds, -t) unlimited
max user processes (-u) 2666
virtual memory (kbytes, -v) unlimited
Super useful, thank you! This gave me an idea for what to try next. I'll report back!
Good news: I can reproduce this error now :tada:. Not found the problem yet though. Just briefly popping in to say the previous commit can be ignored; it does not fix the problem.
This is fixed! I'm going to do another release later today that will include the fix. I'll close this issue once the release is out.
The problem was a bug in the i.dat parser: It would not parse triple-tuple types correctly. If one was included in the i.dat
file any data parsed after the triple tuple would be garbage. The way that manifested in your particular case was that Elm-pair would start parsing an Int it thought might describe the number of constructors of a type or some such and would get back a garbage number that happened to be very large. Then it would try to allocate a vector to contain those many fake constructors and crash.
The library exporting the tuple was waratuman/time-extra
, for example here. I think it might have been the only one in your dependencies (triple tuples being pretty rare), as I can only reproduce the problem when I install exactly that lib. Elm-pair doesn't yet call elm-make
to regenerate the i.dat
file when the elm.json
changes, so I think maybe in your SSCCE the i.dat
file might still have included time-extra
.
Thank you again for your super helpful debug output, it was indispensible!
Wow @jwoudenberg, nice! The explanation totally makes sense.
The error was a bit hit-and-miss for me, not always reproducible - this was probably what you're describing here:
Elm-pair doesn't yet call
elm-make
to regenerate thei.dat
file when theelm.json
changes, so I think maybe in your SSCCE thei.dat
file might still have includedtime-extra
.
And I couldn't decide whether it was waratuman/time-extra
or justinmimbs/timezone-data
that caused the error. Sorry for leading you down the "it must surely be the huge library" path.
Glad this is sorted out and you found the triple-tuple parsing bug!
I just want to confirm that I've built the current main
revision of the repo and it's not crashing anymore - it's running quite nicely on our huge production codebase 🙂 I consider this fixed! Thanks again ❤️
Hooray, thank you so much! I'm also super happy to hear it's doing well on a huge production codebase. That's kind of hard to test for me, the biggest project I currently have access to is elm-spa-example
. If you notice anything like slowness or large resource usage, please let me know!
I just pushed release 6, which is probably the exact same thing you're running now. In addition to this fix Elm-pair should also be less eager to rename now, per your tweet.
elm-pair version: manually compiled from the commit
091db2e626f9a7b042577917fdffa6e1546b8ae7
After opening
vim src/Main.elm
there is no elm-pair functionality in neovim,pgrep elm-pair
returns nothing, and~/Caches/elm-pair/0.4.0/log
says:If I do
RUST_BACKTRACE=1
it's mostly the same but tells me to doRUST_BACKTRACE=full
. Doing that leads tocontinuing for 109 lines.