microsoft / glTF-SDK

glTF-SDK is a C++ Software Development Kit for glTF (GL Transmission Format -https://github.com/KhronosGroup/glTF).
MIT License
542 stars 101 forks source link

Buffer error when reaching 2GB (2.147.869.560) byteLength #116

Open jimver04 opened 1 year ago

jimver04 commented 1 year ago

Hi,

my data to export are very big, and GLTFSDK raises an exception.

totalbyteLength seems not to go beyond 2.147.869.560 bytes. Do you know the maximum of the offset that ResourceWriter can write ? How can I extend it ?

L86 @ ResourceWriter.cpp

image

L34 @ StreamUtils.cpp

image

I noticed that Buffer and BufferView always have 1 item when exporting to glb. Shouldn't be more reasonable to have multiple BufferViews and Buffers so as not to reach exceptions?

Best, Dimitrios

bghgary commented 1 year ago

See here: https://github.com/KhronosGroup/glTF/issues/2114

jimver04 commented 1 year ago

I don't think it is related to #2114.

It is related to the limit of std::ostream to 2GB because we are flushing data in the end of all processes:

image

I think we should flush more frequently to the file in order to avoid the 2GB limit of std::ostream.

It has also to do that we have only 1 Buffer and 1 BufferView

bghgary commented 1 year ago

I don't think it is related to #2114.

The code issue is not related, but the GLB file format itself cannot be bigger than 2GB because the header uses a uint32 for the length. Even if we fix the code issue, it still won't work.

EDIT: Or are you writing to a glTF and not a GLB?

jimver04 commented 11 months ago

Hi,

I am trying to export a 2.8GB CAE model in a single GLB file. So far, I have managed to export models of maximum size of int32, namely 2GB.

The total file length limit and the chunck1 (binary part) length limit are both uint32. See below : image

The uint32 maximum is 4GB: 0xFFFFFFFF (or 4294967295 in decimal) image

So I assume that writing a 3.8 GB binary buffer with a 0.2 GB json is possible as regards GLB standards.

The crash in GLTF-SDK happens during addAccessor - more specifically in StreamUtils::WriteBinary(std::ostream& stream, const T* data, size_t size). It has the following callstack:

image

image

image

image

I have made a function that calculates the number of bytes (N) needed for my model a priori - without any bufferBuilder - by iterating all my models parts and summing up the nbytes for indices, nbytes for vertices, nbytes for morphing, and nbytes for vertex color for the last simulation frame.

Somehow, I should find a way to pre-allocate N bytes in ostream, so that StreamUtils::WriteBinary(std::ostream& stream, const T* data, size_t size) does not crash when the size of stream exceeds 2GBs.

Best, Dimitrios

jimver04 commented 11 months ago

In my PC the streamsize does not seem to suffer the 2GB limit image

bghgary commented 11 months ago

Ahh, sorry, I'm dumb and I'm not thinking about unsigned. Well, that seems like a bug then. Do you have an easy repro I can use to test the code?

jimver04 commented 11 months ago

Unfortunately, I can not share the CAE models, and it is difficult to reproduce without a very big model. I am copying some screenshots with information below. It is seen that if the limit of 2GB is reached while adding accessors through bufferBuilder, then the size of the ostream (in this case it is stream in ram) becomes -1, and failbit and badbit become true. See information for failbit and badbit in here: https://cplusplus.com/reference/ios/ios/rdstate/
image

jimver04 commented 11 months ago

Hi, I have found this below but not tested yet as my project is compiled with make and I do not know how to insert this option.

In order in C++ to use objects more than 2GB you should set /LARGEADDRESSAWARE in Linker properties in Configuration Properties > Linker > System property page. See these resource for more details: [1] https://stackoverflow.com/questions/37413998/why-my-program-does-not-take-more-than-2-gb-ram-on-64-gb-ram-system [2] https://learn.microsoft.com/en-us/cpp/build/reference/largeaddressaware-handle-large-addresses?view=msvc-170 [3] https://stackoverflow.com/questions/3109543/what-to-do-to-make-application-large-address-aware

image

jimver04 commented 11 months ago

Hi, I have created this simple example that replicates the issue. When I run it in x86 mode the limit is 512Mb, whereas in x64 mode the limit is 2 GB. LargeAddressAware option does not affect any mode. The result is the same.

image

I am using stringstream because GLB is using stringstream as found here: image

Here is my example

#include <iostream>
#include <sstream>

int main() {

    // Create a chunk of data that is 128 Mbs
    int szDataChunk = pow(2, 27);  
    char* data = new char[szDataChunk]; 

    //The number of times to add this chunk to a stream.
    // So, 128 Mbs * 32 = 2^(27+5) = 2^32 = 4 GBs 
    int ntimes = pow(2, 5); 

    std::stringstream* m_stream = new std::stringstream();

    for (int i = 0; i < ntimes; i++)
    {
        // Write the data to stream
        m_stream->write(data, szDataChunk); 

        // See the size onscreen
        std::cout << m_stream->tellp() << " bytes | "
                 << m_stream->tellp()/ (1024*1024) << " Mb \n"; 
    }

    return 0;
}
jimver04 commented 11 months ago

It seems that it is OS / Compiler related.

jimver04 commented 11 months ago

On the other side, there is no strict limit if we use std::string instead of std:stringstream. The example below in Windows demonstrates that std::string can easily reach 16GBs. Moreover std::string seems faster and we can also preallocate the memory needed by calculating a priori the number of bytes per indice and vertex (Stringstream does not allow preallocation). However, many things should change in the GLTF-SDK code ...

image

bghgary commented 11 months ago

It looks like Microsoft's STL implementation limits string buffers to INT_MAX.

https://github.com/microsoft/STL/blob/cf1313c39169dc376761eddee23c5e408e01aaa9/stl/inc/sstream#L252-L261

bghgary commented 11 months ago

Looks like there were similar issues before: https://github.com/microsoft/STL/issues/578 https://github.com/microsoft/STL/issues/388

@jimver04 Do you mind filing an issue on Microsoft's STL for your test case?

bghgary commented 11 months ago

@jimver04 I don't know what your code looks like, but GLBResourceWriter has a constructor with a second argument std::unique_ptr<std::iostream> tempBufferStream. If you pass in your own for this (maybe using a local file instead), it will avoid the usage of std::stringstream. Hopefully this is good enough until the STL is fixed.

jimver04 commented 11 months ago

@bghgary It seems that it is an old issue that never got priority in MSVC. In Gcc the limits seem to be based on __string_type::size_type https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/sstream which is not so explicitly provided. So I guess it goes up to memory available. I have started modifying GLTF-SDK to replace std:stringstream with std::string wherever necessary. Thanks for the information. My changes are send to branch: https://github.com/jimver04/glTF-SDK

jimver04 commented 11 months ago

I have also mentioned the issue of 4GB limitation of GLTF to Khronos group: https://github.com/KhronosGroup/glTF/issues/1051#issuecomment-1847135671

bghgary commented 11 months ago

@bghgary It seems that it is an old issue that never got priority in MSVC. In Gcc the limits seem to be based on __string_type::size_type https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/sstream which is not so explicitly provided. So I guess it goes up to memory available. I have started modifying GLTF-SDK to replace std:stringstream with std::string wherever necessary. Thanks for the information. My changes are send to branch: https://github.com/jimver04/glTF-SDK

I don't see this as an old issue. As I pointed to earlier, the code is checking against INT_MAX and fails to allocate bigger. The two old issues I pointed to are fixed as far as I can tell. I don't see a reason why std::stringstream should be limited to INT_MAX.

jimver04 commented 11 months ago

I am trying to use std::string instead of std::stringstream. Why do you have caches ? StreamCache, IStreamCache, StreamCacheLRU ? What do you cache ? The Buffer stream or the File output path ? and why ?