Open teo-tsirpanis opened 2 years ago
The strings that get mutated in the non-LowTrust version of FParsec are newly allocated and either not externally shared or not mutated afterwards. While this isn't officially supported, this doesn't cause undefined behaviour, data corruption or crashes. If you look at the implementation of the dotnet String
and StringBuilder
classes, you'll see that they do the same. Is there any change int .NET runtime that makes you concerned about the future safety of this?
The only reason that I've been using strings for StringBuffer
is that this makes it possible to apply a Regex
without copying the string. Since .NET 7 introduces ReadOnlySpan<char>
overloads, we can finally get rid of this hack.
Your suggestion to use GC.AllocationUninitializedArray
to directly allocate pinned buffers of the chunk size and then replace the current chunking and pool logic with a simple chunk pool sounds great to me.
Is there any change int .NET runtime that makes you concerned about the future safety of this?
Not yet implemented but once https://github.com/dotnet/runtime/blob/main/docs/design/features/StringDeduplication.md lands we will be in trouble. See also dotnet/runtime#27515 and https://github.com/dotnet/runtime/issues/36989#issuecomment-636497330
If you look at the implementation of the dotnet
String
andStringBuilder
classes, you'll see that they do the same.
Because of its tight coupling with the runtime, System.Private.CoreLib
can do stuff other libraries are not allowed to do.
Thanks for the pointers!
I'm curious whether the deduplication will ever land in this form. But you're right, we should get rid of the string mutation in FParsec.
The
StringBuffer
class is built around allocating big strings, pinning them and mutating them. Mutating strings is undefined behavior and prone to cause crashes or data corruption now or in the future.What I recommend instead is to use
GC.AllocateUninitializedArray
to allocate character arrays in the Pinned Object Heap. With that, the chunking logic will go away (you don't have to allocate a big array and slice it, one allocation per slice is OK, and you don't have to bother with GC handles.I did not include it with #92; this will need to touch code in
CharStream
, something I am not comfortable with. But you can ping me once you have a PR and I can review it.