ironfede / openmcdf

Microsoft Compound File .net component - pure C# - netstandard 2.0
Mozilla Public License 2.0
309 stars 76 forks source link

Support reading OLE stream as Stream #62

Open phuclv90 opened 4 years ago

phuclv90 commented 4 years ago

I need to save a stream (or parts of it) to a file, but in CFStream the only methods that can be used to get data are Read(byte[] buffer, long position, int count) and byte[] GetData(), thus I have to get a byte[] buffer every time and write it to file. As the buffer is larger than 85000 bytes, it's put on the large object heap and the GC won't collect it right away even if it's not used anywhere else. As a result for big streams my app becomes a memory hog when saving big streams and I have to call GC.Collect() manually

I've written a custom class to wrap CFStream that extends System.IO.Stream and calls CFStream.Read() inside its Read overload, but the result is that performance is almost 10 times slower. I debugged and found out that there are a lot of small reads from the stream and a new StreamView is created even when reading just a single byte. After reading GC.Collect() is called, thus there are a lot of GC wake ups in 1 second

I ended up working around the issue by wrapping another layer of System.IO.BufferedStream. But it looks like the issue can be solved much easier and more efficient by exporting StreamView which is currently an internal class. We just need to make it public, or probably some other small changes to make it work

poizan42 commented 4 years ago

Is OpenMcdf.Extensions.CFStreamExtension.AsIOStream not good enough?

phuclv90 commented 4 years ago

@poizan42 I didn't know about that. There's a single OpenMcdf.dll in the current project and I've looked around it to no avail. Searching repo doesn't help either because there are so many CFStream in the result

poizan42 commented 4 years ago

It's in OpenMcdf.Extensions: https://www.nuget.org/packages/OpenMcdf.Extensions

phuclv90 commented 4 years ago

I've checked the source code and it uses cfStream.Read which still results in terrible performance for small/random reads/writes

IS4Code commented 3 years ago

StreamView directly doesn't have to be public, but CFStream should have a method Open/GetStream which constructs it. The only thing missing is to disable writing when the file is opened with CFSUpdateMode.ReadOnly.