vpenades / SharpGLTF

glTF reader and writer for .NET Standard
MIT License
454 stars 72 forks source link

Use FileStream to write json gltf #232

Open xPaw opened 2 months ago

xPaw commented 2 months ago

Currently trying to save a file over 2gib results in this exception:

System.OutOfMemoryException: Cannot allocate a buffer of size 2147487611.
   at System.Text.Json.ThrowHelper.ThrowOutOfMemoryException(UInt32 capacity)
   at System.Text.Json.Utf8JsonWriter.Grow(Int32 requiredSize)
   at SharpGLTF.IO._JSonSerializationExtensions.TryWriteValue(Utf8JsonWriter writer, Object value)

I believe this is due to the use of MemoryStream (which itself is limited to 2gib). Specifically here:

https://github.com/vpenades/SharpGLTF/blob/dc199769f174edd4f956a85d82f4a76d5f9842e0/src/SharpGLTF.Core/Schema2/Serialization.WriteContext.cs#L226-L233

I believe switching this to a direct FileStream would solve this, and not require keeping the entire JSON object in memory.

EDIT: This can also be done in WriteBinarySchema2.

This may also require calling Utf8JsonWriter.Flush() in strategic places.

vpenades commented 2 months ago

Unfortunately it's not that simple. WriteContext is an abstraction layer that can write to a file, yes, but it can also be used to write to a memory buffer, or to a dictionary key, which is used internally.

Maybe it could be possible to check when the write target is a file, and use a specific path, but it's a non trivial solution,

In any case, I don't think writing a 2GB json is a good idea, and I would consider splitting the model into smaller parts.

xPaw commented 2 months ago

I saw the callback stuff, but when called from SaveToFile it could have a faster path to write directly to a (file)stream. Aside from splitting big gltf, it still would provide a good memory optimization.

xPaw commented 2 months ago

Here's one way it could be done, my primary point is that SaveGLTF is given a full file path to save to, so there's no reason it can't be optimized to stream directly into said file.

For this example I added a stream bool since WriteContext doesn't directly know if it can stream into a file on its own from what I can tell. But I think Utf8JsonWriter will keep buffering data and it will require putting Flush() calls in some places.

Basically streaming into a file directly would avoid this specific 2gb limit for writing gltf files, and reduce memory usage as it won't need to keep the entire buffer in memory.

diff --git a/src/SharpGLTF.Core/Schema2/Serialization.WriteContext.cs b/src/SharpGLTF.Core/Schema2/Serialization.WriteContext.cs
index e01a02d..ead4d52 100644
--- a/src/SharpGLTF.Core/Schema2/Serialization.WriteContext.cs
+++ b/src/SharpGLTF.Core/Schema2/Serialization.WriteContext.cs
@@ -201,7 +201,7 @@ namespace SharpGLTF.Schema2
        /// "<paramref name="name"/>.{Number}.bin|png|jpg|dds"<br/>
        /// where <paramref name="name"/> is used without extension.
        /// </remarks>
-       public void WriteTextSchema2(string name, MODEL model)
+       public void WriteTextSchema2(string name, MODEL model, bool stream = false)
         {
             Guard.NotNullOrEmpty(name, nameof(name));
             Guard.FilePathMustBeValid(name, nameof(name));
@@ -223,12 +223,19 @@ namespace SharpGLTF.Schema2

             _ValidateBeforeWriting(model);

-            using (var m = new MemoryStream())
+            string finalName = Path.HasExtension(name) ? name : $"{name}.gltf";
+
+            if (stream)
             {
+                var path = Path.Combine(CurrentDirectory.FullName, finalName);
+                using var fs = File.Create(path);
+                model._WriteJSON(fs, this.JsonOptions, this.JsonPostprocessor);
+            }
+            else
+            {
+                using var m = new MemoryStream();
                 model._WriteJSON(m, this.JsonOptions, this.JsonPostprocessor);

-                string finalName = Path.HasExtension(name) ? name : $"{name}.gltf";
-
                 WriteAllBytesToEnd(finalName, m.ToArraySegment());
             }

diff --git a/src/SharpGLTF.Core/Schema2/Serialization.WriteSettings.cs b/src/SharpGLTF.Core/Schema2/Serialization.WriteSettings.cs
index 31a1c1e..79b8cec 100644
--- a/src/SharpGLTF.Core/Schema2/Serialization.WriteSettings.cs
+++ b/src/SharpGLTF.Core/Schema2/Serialization.WriteSettings.cs
@@ -220,7 +220,7 @@ namespace SharpGLTF.Schema2

             context.WithTextSettings();

-            context.WriteTextSchema2(name, this);
+            context.WriteTextSchema2(name, this, true);
         }

         [Obsolete("Use GetJsonPreview", true)]
vpenades commented 2 months ago

Depending on what kind of operations you do, you might hit some issues with that solution.

For example, if you want to write the model to a .GLB file, I need to modify the internal structures of the DOM, which is a destructive change. So to avoid breaking the current DOM, I create an entire copy of the model in memory, which uses the WriteContext to memory.

So if you write the model to plain json+bin files, it's probably fine.

There's a recurrent problem with glTFs being pushed to the limits... the general agreement in the khronos board is to avoid creating models of more than 2GB, because even if the writer is able to create the model, there's no guarantee that any other reader in the wild will be able to read it. In other words, a +2GB model is probably less compatible than a smaller model.

Anyway, I'll look into it when I have some spare time.

vpenades commented 2 months ago

I've been looking for a solution and I think it's possible to implement a solution similar to your proposal.

Looking further, I noticed the reader also has a limit. But unlike the writer, the jsonreader only accepts reading from memory.

So any gltf written by this proposal, that produces a +2GB json will not be able to be read back by the library.

Apparently jsonreader also accepts a memory sequence, that looks like a circular buffer, but I have no clue how it works and how to implement such a solution