stephenberry / glaze

Extremely fast, in memory, JSON and interface library for modern C++
MIT License
1.09k stars 110 forks source link

std::array<char> partial read handling #1245

Open pwqbot opened 1 month ago

pwqbot commented 1 month ago

since glaze remove reflection support for c array, we migrate to std::array, but we encounter this problem

struct TestArray {
    std::array<char, 8> a;
};

TEST(glz_json, array) {
    std::string_view payload = R"(
        {
            "a": "abcdef"
        }
    )";
    TestArray result;
    auto ec = glz::read_json(result, payload);
    ASSERT_FALSE(ec) << glz::format_error(ec, payload);
    EXPECT_STREQ(result.a.data(), "abcdefg");
}

output

../../../../test/unit/glz_json_test.cpp:179: Failure
Value of: ec
  Actual: true
Expected: false
3:18: expected_bracket
               "a": "abcdef"
stephenberry commented 1 month ago

Currently Glaze expects std::array<char, 8> to be an array of characters: ["a", "b", "c"]

We could make std::array<char...> behave as a string type, but I'm not sure this is a good idea. I have some questions and ideas.

Why can't you use std::string? This way you can read in any length without errors and when working with short strings like this the small string optimization will keep it on the stack.

Otherwise, why can't you use std::string_view? The way Glaze treats parsing into a std::string_view is that it will be a string view to a portion of the input buffer. So, this only works if your input buffer is going to exist when you look at this value.

We could add a glz::stack_string, which would alias std::basic_string with a custom allocator that uses stack memory. This would allow you to still use functions like find, append, push_back, etc. and conform better to C++ string usage. For example: glz::stack_string<8> str = "abcdef";

Thanks for bringing up this limitation and let me know your thoughts.

stephenberry commented 1 month ago

I'll also note that you can still use c-style arrays, you just need to write a glz::meta for your struct.

stephenberry commented 1 month ago

After thinking about it some more, I do like the idea of std::array<char, N> behaving as a string. I've added support here: #1247. However, I'd still like to hear your thoughts.

pwqbot commented 1 month ago

After thinking about it some more, I do like the idea of std::array<char, N> behaving as a string. I've added support here: #1247. However, I'd still like to hear your thoughts.

Because we want a POD structure that can be easily passed between different processes. The stack_string idea is quite good, similar to the inplace_vector in c++26.

stephenberry commented 1 month ago

@pwqbot, I made a stack_string implementation using a custom allocator for std::basic_string. However, I'm trying to figure out error handling. I would like to use std::errc not_enough_memory to handle out of memory issues rather than throw exceptions, but I can't quite get it to work.

Here is a basic exception version, but I want to make sure that we reuse the buffer:

#include <string>
#include <array>
#include <memory>

template <class T, size_t N>
struct stack_allocator {
    using value_type = T;
    std::array<T, N> buffer;
    size_t used = 0;

    T* allocate(size_t n) {
        if (used + n > N) throw std::bad_alloc();
        T* result = &buffer[used];
        used += n;
        return result;
    }

    void deallocate(T* p, size_t n) noexcept {
        if (p == &buffer[used - n]) used -= n;
    }

    template <class U>
    struct rebind {
        using other = stack_allocator<U, N>;
    };

    stack_allocator() = default;
    template <class U>
    stack_allocator(const stack_allocator<U, N>&) {}
};

template <class T, class U, size_t N>
bool operator==(const stack_allocator<T, N>&, const stack_allocator<U, N>&) { return true; }

template <class T, class U, size_t N>
bool operator!=(const stack_allocator<T, N>&, const stack_allocator<U, N>&) { return false; }

template <size_t N>
using stack_string = std::basic_string<char, std::char_traits<char>, stack_allocator<char, N>>;

#include <print>

int main()
{
    stack_string<16> str = "Hello World";
    std::print("{}", str);
}
pwqbot commented 4 weeks ago

@pwqbot, I made a stack_string implementation using a custom allocator for std::basic_string. However, I'm trying to figure out error handling. I would like to use std::errc not_enough_memory to handle out of memory issues rather than throw exceptions, but I can't quite get it to work.

Here is a basic exception version, but I want to make sure that we reuse the buffer:

#include <string>
#include <array>
#include <memory>

template <class T, size_t N>
struct stack_allocator {
    using value_type = T;
    std::array<T, N> buffer;
    size_t used = 0;

    T* allocate(size_t n) {
        if (used + n > N) throw std::bad_alloc();
        T* result = &buffer[used];
        used += n;
        return result;
    }

    void deallocate(T* p, size_t n) noexcept {
        if (p == &buffer[used - n]) used -= n;
    }

    template <class U>
    struct rebind {
        using other = stack_allocator<U, N>;
    };

    stack_allocator() = default;
    template <class U>
    stack_allocator(const stack_allocator<U, N>&) {}
};

template <class T, class U, size_t N>
bool operator==(const stack_allocator<T, N>&, const stack_allocator<U, N>&) { return true; }

template <class T, class U, size_t N>
bool operator!=(const stack_allocator<T, N>&, const stack_allocator<U, N>&) { return false; }

template <size_t N>
using stack_string = std::basic_string<char, std::char_traits<char>, stack_allocator<char, N>>;

#include <print>

int main()
{
    stack_string<16> str = "Hello World";
    std::print("{}", str);
}

@stephenberry found a paper here https://github.com/cplusplus/papers/issues/1762, It looks like it will be accepted into C++26

stephenberry commented 4 weeks ago

@pwqbot, neat! std::basic_fixed_string looks like the appropriate solution for the future, because it will allow us to still use std::array<char, N> for JSON arrays of characters and std::basic_fixed_string for actual stack allocated strings.

Would you be willing to make an implementation that matches the paper for glaze? A glz::basic_fixed_string that we can use until we have std::basic_fixed_string available? If not, I can make one, it will just take me some time before I can get to it.

pwqbot commented 4 weeks ago

@stephenberry I think mp-units has already implemented it(also inplace_vector), we just need to Import the code into glaze. https://github.com/mpusz/mp-units/blob/master/src/core/include/mp-units/ext/fixed_string.h

stephenberry commented 3 weeks ago

Sweet, I'll use that implementation for reference

pwqbot commented 19 hours ago

@stephenberry Sorry, after looking into fixed_string, I found that it is actually just a compile-time string class, not the kind of inplace_vector-like string class that we need. Boost has a related implementation: https://github.com/boostorg/static_string.