modularml / mojo

The Mojo Programming Language
https://docs.modular.com/mojo
Other
22.11k stars 2.54k forks source link

[BUG] question about writing to file #3074

Open f-saez opened 2 weeks ago

f-saez commented 2 weeks ago

Bug description

Hi,

One again, maybe a bug, maybe I've misundertood how file works in Mojo.

from pathlib import Path

def main():
    var t1 = List[UInt8](capacity=8)
    for i in range(t1.capacity):
        t1.append(i)

    var filename = Path("file.tmp")
    with open(filename, "wb") as f:
        f.write(t1)

    var stat = filename.stat()
    print("t1.size: ",t1.size)
    print("file size: ",stat.st_size)

    var t2 = List[UInt8]()
    with open(filename, "rb") as f:
        t2 = f.read_bytes()

    print("t2.size: ",t2.size)

Very basic stuff. I insert 8 bytes in a list, I write the list on a file, and then I check the size of the file (should be the size of the list), then I read the file and it should have the same content as the list. trouble is : the last byte is missing

t1.size:  8
file size:  7
t2.size:  7

First thing that come to mind is that "write" expect a zero-terminal string and eat the last byte each time I call write. But t1 is not a string and the compiler doesn't raise a single error. "read_bytes" exists but I haven't found "write_bytes"

Steps to reproduce

execute the code provided

System information

OS Fedora 40 x86_64
mojo 24.4.0 (2cb57382)
modular 0.8.0 (39a426b5)
ematejska commented 2 weeks ago

Reproduces with latest nightly build: mojo 2024.6.2005 (6f19fd07).

Dan13llljws commented 2 weeks ago

It is because List[UInt8] gets constructed to a String when pass into f.write(). This constructor requires your list, t1 to have null terminator as last element (t1[-1] == 0). If you run with mojo -D MOJO_ENABLE_ASSERTIONS, your code will crash with Assert Error: expected last element of String buffer to be null terminator.

f-saez commented 1 week ago

thanks.

I already add a 0 when I need to write a bunch of bytes.

It also means that one can construct a String with invalid UTF8 code-point (or maybe I was lucky all this time :-) ), but I guess it's another debate.

Do I need to close this bug ?

Dan13llljws commented 3 days ago

It also means that one can construct a String with invalid UTF8 code-point (or maybe I was lucky all this time :-) ), but I guess it's another debate.

It seems like the case for me. We are treating last character as the null terminator. For String.__len__() we are simply returning the buffer size - 1.