Closed stensonowen closed 5 years ago
Can I ask, why do you need to keep all these 8GB of wiki data inside the executable instead of reading them dynamically from some other file instead?
I'm trying to do as much work as possible at compile time. Also I'm using phf to make these lookups fast, which results in a static hashmap, which I can't edit safely. I know it's a niche case.
With 8GB of data, you may have better luck using some kind of database. Sqlite and postgres offer types of indexed full text search.
I'm using phf to reduce unnecessary cache misses in a hash table; I'd rather not keep data on disk. I was trying to use a hash table without collisions to reduce the maximum lookup time, and I don't think a database can offer comparable speeds. But if this isn't going to be fixed, I'll have to rethink things.
I made an 8GB source file using a long string, and tried compiling it with a 64bit rustc. I got the same crash and backtrace. So this issue isn't related to phf.
I took a glance though the stack trace & grepped the source, found this: https://github.com/rust-lang/rust/blob/master/src/libsyntax_pos/lib.rs#L533-L536
From the comment, it looks like BytePos is a newtype of u32 by design, to save space.
An easy next thing to try would be changing this to u64 and
If it works for step 1, and step 2 shows it's an unacceptable perf hit to just u64 things, is there a (safe/sane) way to do fancy bitpacking? (e.g. use u32's until 2^31, and use the high bit to do something fancier)
That makes sense. I might play around with a rust fork for this project, but this seems to be expected and deliberate behavior so I won't expect a fix. Thanks.
The ICE definitely is not intentional. At the very least we should emit a proper error here instead of failing hard.
Triage: bytepos is still a u32: https://github.com/rust-lang/rust/blob/master/src/libsyntax_pos/lib.rs#L1210
I wonder what the easiest way to create a test case to reproduce this is.
This got fixed in ccb2dfbfec812d1502626992a8856df27c4fa950
This is such an edge case that a test probably isn't really worth it and would be somewhat hard due to constraints on space/time on CI.
When compiling a project with some large files (that were codegen'd), I run into a compiler panic that looks like the result of a 32-bit underflow.
This is the result of
$ RUST_BACKTRACE=1 time cargo build --verbose
4294967293 is three less than 232, which leads me to believe a 32-bit integer underflowed. The file sizes total about 8GB and the memory usage when compiling reaches around 60GB.
Here are the Cargo.toml, Cargo.lock, src/lib.rs, and the first 100 lines of src/codegen_entries (generated by phf) and src/codegen_links.
Version info: I had this issue with a recent nightly, so I updated but had it again.
Trying to build with beta gives the same error. Trying with stable gives
and
The files are large enough that I didn't host them anywhere, so you probably won't be able to duplicate this easily (also it requires a ~60GB swap file and ~40 minutes). If that's important I can try to find somewhere to host the files.
Is there something I can do to address this without scrapping the project? Do I have any alternatives?