vlang / v

Simple, fast, safe, compiled language for developing maintainable software. Compiles itself in <1s with zero library dependencies. Supports automatic C => V translation. https://vlang.io
MIT License
35.8k stars 2.17k forks source link

32 bit array len limit #15377

Open jrfondren opened 2 years ago

jrfondren commented 2 years ago

V full version: V 0.3.0 b01f71d.8c33a40 Processor: 8 cpus, 64bit, little endian, Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz

What did you do?

Try to serve a 1.9G file in #15376

What did you expect to see?

A successfully downloaded file, or, at least, an error.

What did you see instead?

Each request caused a thread to spin at 100% CPU forever, never returning the file.

This was traced to a string interpolation of the response body, but that of course shouldn't fail. In investigating that, with help from #help, I came up with this test case:

import math

fn main() {
    unsafe {
        mut p := malloc(math.max_u32)
        vmemset(p, 1, math.max_u32 - 1)
        p[math.max_u32 - 1] = 0
        println('before crash:')
        s := cstring_to_vstring(&p[0])
        println('after crash')
        println(s.len)
    }
}

Which crashes at the cstring_to_vstring, with this output:

before crash:
V panic: malloc_noscan(-1 <= 0)
v hash: 8c33a40
/tmp/v_1000/strint.9715218836963737045.tmp.c:6967: at _v_panic: Backtrace
/tmp/v_1000/strint.9715218836963737045.tmp.c:7203: by malloc_noscan
/tmp/v_1000/strint.9715218836963737045.tmp.c:9892: by string_clone
/tmp/v_1000/strint.9715218836963737045.tmp.c:9782: by cstring_to_vstring
/tmp/v_1000/strint.9715218836963737045.tmp.c:14882: by main__main
/tmp/v_1000/strint.9715218836963737045.tmp.c:15353: by main

The problem seems to be that vstrlen(), tos(), the string type itself, and many other places use 32-bit int as the length of a string, and math.max_u32 is the same number as int(-1). 2GB also is 0x8000_0000, which is equal to math.min_i32. I imagine that the actual bug in the string interpolation case was due to the 1.9GB file's size being close enough to the limit it flipped signs in an expected place.

32-bit platforms don't have the address space for a 2GB string, but lots of libc and POSIX functions deal in C's size_t type, which is 64-bits on a 64-bit platform.

JalonSolov commented 2 years ago

I'd say usize if a variable sized type is needed. Unless someone has a very good reason why you should be able to allocate negative amounts of memory...

jrfondren commented 2 years ago

Unsigned seems to be the most popular, but in go the type is int (where it's variable-sized, with u32 et al. for specific sizes), which is signed. I'm not aware of a justification but I imagine it's

  1. surely nobody will ever need the 64th bit of an unsigned number. It's pretty big, at 18446744073709551615, or just under 16 exobytes
  2. that it's signed saves users from easy unsigned math errors, like this infinite loop:
import math
fn main() {
    mut i := u16(math.max_u16)
    for i >= 0 {
        println(i)
        i--
    }
}

Some discussion: https://stackoverflow.com/questions/39088945/why-does-len-returned-a-signed-value

markus-oberhumer-forks commented 2 years ago

JFYI, Go changed its default integer size on 64-bit platforms from 32 to 64 bits back in 2013 one year after its official 1.0 release...

https://go.dev/doc/go1.1

medvednikov commented 2 years ago

https://stackoverflow.com/questions/39088945/why-does-len-returned-a-signed-value

jrfondren commented 2 years ago

@medvednikov the problem isn't that it returns a signed value. That's what I prefer. The problem is that it's 32-bit so V cannot support slices of 2G or larger.

medvednikov commented 2 years ago

I see, and in Go it's solved by int being an i64 on 64 bit systems.

emily33901 commented 2 years ago

@medvednikov the problem isn't that it returns a signed value. That's what I prefer. The problem is that it's 32-bit so V cannot support slices of 2G or larger.

I'm not sure you should have the 1.9G file in memory?

jrfondren commented 2 years ago

@emily33901 the particular example is obviously avoidable.

JalonSolov commented 2 years ago

Until V gets a way to stream files to disk, there's no real way to avoid that 1.9G file in memory.

Besides, 1.9G file in memory is no big deal for those of us with plenty of RAM.

spytheman commented 2 years ago
struct KB { bytes [1024]u8 }
big := []KB{len: 5 * 1024 * 1024} // big should be now 5GB
dump(u64(big.len) * u64(big.element_size))

produces:

[s.v:6] u64(big.len) * u64(big.element_size): 5368709120

i.e. imho the problem is more of convenience of operation, and that many APIs in vlib expect []u8, and that has .element_size of 1 .

jrfondren commented 1 year ago

int => i64 on 64 bit, i32 on 32 bit

Looks like there's movement towards fixing this with 0.4.3, but the example still crashes as the string type is still using C's int which is 32-bit on x86_64. Also, println(sizeof(int)) still prints 4 for me.

medvednikov commented 1 year ago

Yeah I did the first part. I still need to finish the second part.

Such big transitions have to be done gradually.