saviorand / lightbug_http

Simple and fast HTTP framework for Mojo! 🔥
MIT License
587 stars 37 forks source link

Switch to gojo string builder for string concatenation #14

Closed saviorand closed 8 months ago

thatstoasty commented 8 months ago

@saviorand Let me know if there's anything I can do to help with the string builder or bytes changes!

saviorand commented 8 months ago

@thatstoasty I'm actually running in some memory management issues:


fn encode(res: HTTPResponse) raises -> Bytes:
    var res_str = String()
    var protocol = strHttp11
    var current_time = String()
    try:
        current_time = Morrow.utcnow().__str__()
    except e:
        print("Error getting current time: " + e.__str__())
        current_time = now().__str__()
    var builder = StringBuilder()
    _ = builder.write(String(protocol).as_bytes())
    _ = builder.write(String(" ").as_bytes())
    _ = builder.write(String(res.header.status_code().__str__()).as_bytes())
    _ = builder.write(String(" ").as_bytes())
    _ = builder.write(String("OK").as_bytes())  # res.header.status_message()
    _ = builder.write(String("\r\n").as_bytes())
    _ = builder.write(String("Server: lightbug_http").as_bytes())
    _ = builder.write(String("\r\n").as_bytes())
    _ = builder.write(String("Content-Type: ").as_bytes())
    _ = builder.write(String("text/html").as_bytes())  # res.header.content_type()
    # TODO: propagate charset
    # res_str += String("; charset=utf-8")
    _ = builder.write(String("\r\n").as_bytes())
    _ = builder.write(String("Content-Length: ").as_bytes())
    # TODO: fix this
    _ = builder.write(
        String(355871).as_bytes()
    )  # (res.body_raw.__len__() - 1).__str__()
    _ = builder.write(String("\r\n").as_bytes())
    _ = builder.write(String("Connection: ").as_bytes())
    if res.connection_close():
        _ = builder.write(String("close").as_bytes())
    else:
        _ = builder.write(String("keep-alive").as_bytes())
    _ = builder.write(String("\r\n").as_bytes())
    _ = builder.write(String("Date: ").as_bytes())
    _ = builder.write(String(current_time).as_bytes())
    _ = builder.write(String("\r\n").as_bytes())
    _ = builder.write(String("\r\n").as_bytes())
    _ = builder.write(String("<div>hello frend</div>").as_bytes())
    _ = builder.write(String("\r\n").as_bytes())
    # _ = builder.write(res.body())
    return builder._vector

This is in http.mojo in the encode method. If I uncomment the commented lines, e.g. # _ = builder.write(res.body()), or try printing the string from the string builder after Server: lightbug_http", I either get a segfault in the first case, or an incomplete string in the second case.

Will investigate this more but if I remove some symbols, e.g replace Server: lightbug_http" with just hello I get a more complete string. I suspect it has to do with certain symbols breaking the conversion, but that's just my assumption for now.

Already tried assembling this string with write, write_string, a little with write_byte (this one could be interesting to try again) but no luck so far. Let me know if you have any ideas!

saviorand commented 8 months ago

@thatstoasty also I'm not sure if all the string/byte conversions in the code below are actually needed, this was just me experimenting to try and debug this. Initially I was just writing strings to the builder with write_string following the encode logic in main. The result should roughly be equivalent to the following code:


fn encode(res: HTTPResponse) raises -> Bytes:
    var res_str = String()
    var protocol = strHttp11
    var current_time = String()
    try:
        current_time = Morrow.utcnow().__str__()
    except e:
        print("Error getting current time: " + e.__str__())
        current_time = now().__str__()
    res_str += protocol
    res_str += String(" ")
    res_str += res.header.status_code().__str__()
    res_str += String(" ")
    res_str += String(res.header.status_message())
    res_str += String("\r\n")
    res_str += String("Server: lightbug_http")
    res_str += String("\r\n")
    res_str += String("Content-Type: ")
    res_str += String(res.header.content_type())
    # res_str += String("; charset=utf-8")
    res_str += String("\r\n")
    res_str += String("Content-Length: ")
    res_str += (res.body_raw.__len__() - 1).__str__()
    res_str += String("\r\n")
    res_str += String("Connection: ")
    if res.connection_close():
        res_str += String("close")
    else:
        res_str += String("keep-alive")
    res_str += String("\r\n")
    res_str += String("Date: ")
    res_str += current_time
    res_str += String("\r\n")
    res_str += String("\r\n")
    res_str += res.body_raw
    return res_str._buffer
thatstoasty commented 8 months ago

@saviorand Ah I see, my initial guess is that there's an issue with the __str__ method of the StringBuilder. Nothing in gojo currently supports unicode characters ATM, so that could be causing the print to blow up at the end when constructing the StringRef from the builder.

https://github.com/thatstoasty/gojo/blob/main/gojo/strings/builder.mojo#L48

return StringRef(self._vector._vector.data.value, len(self._vector))

I'm thinking a character with length greater than one byte is sneaking in there and messing up the expected total length. There are other ways to construct the string without needing the exact length though, like appending a null terminator to the end of the internal vector and using String() instead of StringRef().

If you can share an example of a response struct being passed to the function, I can give it a look!

saviorand commented 8 months ago

@thatstoasty this was my assumption as well, but I think it could also be related to how I'm accessing struct Bytes fields. Tried to come up with a small reproducible example. If you use a service like this , e.g. in service.mojo

@value
struct Welcome(HTTPService):
    fn func(self, req: HTTPRequest) raises -> HTTPResponse:
        return OK(String("test").as_bytes())

and change the encode function to be just

fn encode(res: HTTPResponse) raises -> Bytes:
    var res_str = String()
    var protocol = strHttp11
    var current_time = String()
    try:
        current_time = Morrow.utcnow().__str__()
    except e:
        print("Error getting current time: " + e.__str__())
        current_time = now().__str__()
    var builder = StringBuilder()
    _ = builder.write(String(protocol).as_bytes())
    _ = builder.write(String(" ").as_bytes())
    _ = builder.write(String(res.header.status_code().__str__()).as_bytes())
    _ = builder.write(String(" ").as_bytes())
    _ = builder.write(res.header.status_message())
    _ = builder.write(String("\r\n").as_bytes())
    _ = builder.write(String("Server: lightbug_http").as_bytes())
    _ = builder.write(String("\r\n").as_bytes())
    _ = builder.write(String("Content-Type: ").as_bytes())
    _ = builder.write(res.header.content_type())
    print(str(builder))

And then run lightbug.🔥, this prints out HTTP/1.1 200 OK, although the string should at least include Server: lightbug_http and Content-Type:. Basically everything after the first access of res.header (which is a Bytes field in the struct below) does not seem to make it into the string assembled by the builder. Maybe I'm missing something?


@value
struct HTTPResponse(Response):
    var header: ResponseHeader
    var stream_immediate_header_flush: Bool
    var stream_body: Bool
    var body_raw: Bytes
    var skip_reading_writing_body: Bool
    var raddr: TCPAddr
    var laddr: TCPAddr

    fn __init__(inout self, body_bytes: Bytes):
        # TODO: infer content type from the body
        self.header = ResponseHeader(
            200,
            String("OK")._buffer,
            String("Content-Type: application/octet-stream\r\n")._buffer,
        )
        self.stream_immediate_header_flush = False
        self.stream_body = False
        self.body_raw = body_bytes
        self.skip_reading_writing_body = False
        self.raddr = TCPAddr()
        self.laddr = TCPAddr()

    fn __init__(inout self, header: ResponseHeader, body_bytes: Bytes):
        self.header = header
        self.stream_immediate_header_flush = False
        self.stream_body = False
        self.body_raw = body_bytes
        self.skip_reading_writing_body = False
        self.raddr = TCPAddr()
        self.laddr = TCPAddr()

    fn set_status_code(inout self, status_code: Int) -> Self:
        _ = self.header.set_status_code(status_code)
        return self

    fn status_code(self) -> Int:
        return self.header.status_code()

    fn set_connection_close(inout self, connection_close: Bool) -> Self:
        _ = self.header.set_connection_close()
        return self

    fn connection_close(self) -> Bool:
        return self.header.connection_close()

    fn set_body_raw(inout self, body: Bytes) -> Self:
        self.body_raw = body
        return self

    fn body(self) -> Bytes:
        return self.body_raw
saviorand commented 8 months ago

@thatstoasty the ResponseHeader struct itself is a bit bigger, you can find the code for it here https://github.com/saviorand/lightbug_http/blob/98dd7efc909b92f66a25dfa7ad5cf364deffce69/lightbug_http/header.mojo#L280

thatstoasty commented 8 months ago

@saviorand Thanks, I was able to reproduce the error and narrow it down a bit. I'm looking into some more, but the first issue I can see is with String's ._buffer attribute compared to String's as_bytes() method.

String("OK")._buffer -> [79, 75, 0]
String("OK").as_bytes() -> [79, 75]

._buffer includes the null terminator byte, which makes sense as to why the response was being terminated at HTTP/1.1 200 OK! The String constructor was not reading anything past that first terminator.

After a global replace of ._buffer with .as_bytes() this encode function worked and the server returned a response to my browser.

fn encode(res: HTTPResponse) raises -> Bytes:
    var res_str = String()
    var protocol = strHttp11
    var current_time = String()
    try:
        current_time = Morrow.utcnow().__str__()
    except e:
        print("Error getting current time: " + e.__str__())
        current_time = now().__str__()
    var builder = StringBuilder()
    _ = builder.write(protocol)
    _ = builder.write(String(" "))
    _ = builder.write(res.header.status_code())
    _ = builder.write(String(" "))
    _ = builder.write(String(res.header.status_message()))
    _ = builder.write(String("\r\n"))
    _ = builder.write(String("Server: lightbug_http"))
    _ = builder.write(String("\r\n"))
    _ = builder.write(String("Content-Type: "))
    _ = builder.write(res.header.content_type())
    # TODO: propagate charset
    # res_str += String("; charset=utf-8")
    _ = builder.write(String("\r\n"))
    _ = builder.write(String("Content-Length: "))
    # TODO: fix this
    _ = builder.write(
        len(res.body_raw) - 1
    )
    _ = builder.write(String("\r\n"))
    _ = builder.write(String("Connection: "))
    if res.connection_close():
        _ = builder.write(String("close"))
    else:
        _ = builder.write(String("keep-alive"))
    _ = builder.write(String("\r\n"))
    _ = builder.write(String("Date: "))
    _ = builder.write(String(current_time))
    _ = builder.write(String("\r\n"))
    _ = builder.write(String("\r\n"))
    _ = builder.write(String("<div>hello frend</div>"))
    _ = builder.write(String("\r\n"))
    _ = builder.write(res.body())
    print(str(builder))

I can open a PR to your feature branch if you are open to contributions

saviorand commented 8 months ago

Yes, would definitely love that if you can contribute! Thanks for this! gojo has been great and easy to use

saviorand commented 8 months ago

Going to close this for now as resolved in #16