stephan-tolksdorf / fparsec

A parser combinator library for F#
527 stars 46 forks source link

Stop mutating strings in `StringBuffer`. #93

Open teo-tsirpanis opened 2 years ago

teo-tsirpanis commented 2 years ago

The StringBuffer class is built around allocating big strings, pinning them and mutating them. Mutating strings is undefined behavior and prone to cause crashes or data corruption now or in the future.

What I recommend instead is to use GC.AllocateUninitializedArray to allocate character arrays in the Pinned Object Heap. With that, the chunking logic will go away (you don't have to allocate a big array and slice it, one allocation per slice is OK, and you don't have to bother with GC handles.

I did not include it with #92; this will need to touch code in CharStream, something I am not comfortable with. But you can ping me once you have a PR and I can review it.

stephan-tolksdorf commented 2 years ago

The strings that get mutated in the non-LowTrust version of FParsec are newly allocated and either not externally shared or not mutated afterwards. While this isn't officially supported, this doesn't cause undefined behaviour, data corruption or crashes. If you look at the implementation of the dotnet String and StringBuilder classes, you'll see that they do the same. Is there any change int .NET runtime that makes you concerned about the future safety of this?

The only reason that I've been using strings for StringBuffer is that this makes it possible to apply a Regex without copying the string. Since .NET 7 introduces ReadOnlySpan<char> overloads, we can finally get rid of this hack.

Your suggestion to use GC.AllocationUninitializedArrayto directly allocate pinned buffers of the chunk size and then replace the current chunking and pool logic with a simple chunk pool sounds great to me.

teo-tsirpanis commented 2 years ago

Is there any change int .NET runtime that makes you concerned about the future safety of this?

Not yet implemented but once https://github.com/dotnet/runtime/blob/main/docs/design/features/StringDeduplication.md lands we will be in trouble. See also dotnet/runtime#27515 and https://github.com/dotnet/runtime/issues/36989#issuecomment-636497330

If you look at the implementation of the dotnet String and StringBuilder classes, you'll see that they do the same.

Because of its tight coupling with the runtime, System.Private.CoreLib can do stuff other libraries are not allowed to do.

stephan-tolksdorf commented 2 years ago

Thanks for the pointers!

I'm curious whether the deduplication will ever land in this form. But you're right, we should get rid of the string mutation in FParsec.