protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
65.11k stars 15.43k forks source link

[C#] Customize recursion limit when using buffer serialization #15773

Closed JamesNK closed 3 weeks ago

JamesNK commented 7 months ago

What language does this apply to? C#

Describe the problem you are trying to solve.

Many gRPC users are running into the default recursion limit of 100 - https://github.com/grpc/grpc-dotnet/issues/1983

I'm investigating what options we can give them to customize this limit. Using the old serialization APIs you can do this:

var s = CodedInputStream.CreateWithLimits(new MemoryStream(data), sizeLimit: int.MaxValue, recursionLimit: int.MaxValue));
var m = RecursionRequest.Parser.ParseFrom(s);

However, I noticed that it's not possible to give limits to the new buffer serialization introduced a few years ago. No API is available that allows people to change the limit while still getting its performance benefits.

Describe the solution you'd like

The buffer serialization code respects the limits, the problem is there is just no API that lets you specify them together with ReadOnlySpan/ReadOnlySequence.

I'd like new APIs on MessageParser and MessageExtensions that allow for limits with ReadOnlySpan/ReadOnlySequence:

public class MessageParser
{
    // Existing
    public IMessage ParseFrom(ReadOnlySequence<byte> data);
    public IMessage ParseFrom(ReadOnlySpan<byte> data);

    // New
    public IMessage ParseFrom(ReadOnlySequence<byte> data, SerializationLimits limits);
    public IMessage ParseFrom(ReadOnlySpan<byte> data, SerializationLimits limits);
}
public class MessageExtensions
{
    // Existing
    public static void MergeFrom(this IMessage message, ReadOnlySequence<byte> sequence);
    public static void MergeFrom(this IMessage message, ReadOnlySpan<byte> span);

    // New
    public static void MergeFrom(this IMessage message, ReadOnlySequence<byte> sequence, SerializationLimits limits);
    public static void MergeFrom(this IMessage message, ReadOnlySpan<byte> span, SerializationLimits limits);
}

// New
public sealed class SerializationLimits
{
    public int SizeLimit { get; set; } = int.MaxValue; // default
    public int RecursionLimit { get; set; } = 100; // default
}

Passing in a class is more flexible as it gives the option for new limits to be added in the future without an overload explosion. However, I'm not against overloads with the two limits is that is preferred, e.g.

public class MessageParser
{
    public IMessage ParseFrom(ReadOnlySequence<byte> data, int sizeLimit, int recursionLimit);
}

I don't think this is difficult, and I'm happy to add a PR. But before that I want to get agreement on API changes.

JamesNK commented 7 months ago

cc @jskeet

jskeet commented 7 months ago

That approach sounds reasonable to me - I'll need to run things by the protobuf team, of course.

I do wonder whether SerializationOptions might be more flexible than SerializationLimits - I can imagine us wanting to add some options which aren't actually limits, maybe. Would you be okay with that?

JamesNK commented 7 months ago

I do wonder whether SerializationOptions might be more flexible than SerializationLimits - I can imagine us wanting to add some options which aren't actually limits, maybe. Would you be okay with that?

I considered a general name. However there are other options/settings such as DiscardUnknownFields and TypeRegistry. A SerializationOptions would probably also have those properties.

However, those properties are set on a MessageParser rather than on CodedInputStream:

https://github.com/protocolbuffers/protobuf/blob/959e5df20f1c6153b124a12bee2a5a349ce6d1c4/csharp/src/Google.Protobuf/MessageParser.cs#L372-L386

What would the behavior be if a MessageParser has a type registry and the serialization options had a type registry?

To be honest I'm not quite sure why some serialization settings are set one way, and others are set another. I guess the size and recursion limits are related to reading any protobuf, while DiscardUnknownFields and TypeRegistry are about mapping protobuf data to message types.

jskeet commented 7 months ago

To be honest I'm not quite sure why some serialization settings are set one way

It's probably a mixture of influences from original Java code and other decisions over time. A lot of this code goes back to 2008 :)

I personally wouldn't mind having serialization options which could be used to override parser options, but this definitely needs a bit more thought.

JamesNK commented 6 months ago

What should the next steps be here? I can write a PR if it would help create clarity.

jskeet commented 6 months ago

I think a PR would be helpful. Thinking about it a bit more over the weekend, I wonder whether:

Those are only thoughts - I'm definitely not trying to set them in stone.

There may be reasons why those options don't work, of course - stuff we'll only find out when we try to implement it. If you have the time to create a sample PR, that would be really useful.

mulingya commented 4 months ago

Is there any progress on this matter so far? I think MessageParser.Settings proposed by jskeet is good.

JamesNK commented 4 months ago

No. You're welcome to implement it.

mulingya commented 4 months ago

No. You're welcome to implement it.

I have read a part of the source code of protobuf, and the design of protobuf requires compatibility with previous versions. I am not 100% confident in modifying the source code, and maintaining compatibility with older versions makes me feel nervous.

mulingya commented 4 months ago

I have attempted to make changes that only include parameters that I can understand.


     // I think ParseSettings should be a separate file. The main reasons are as follows:
     //   1.MessageParser.Setting naming is good, but it sounds like an internal class or public
     //     property of MessageParser. Although it can maintain consistency with JsonParser, from 
     //     the perspective of the user's parameter passing path, these parameters need to go 
     //     through MessageParser, MessageExtensions, CodedInputStream, ParseContext in sequence, 
     //     and finally become the property value of ParserInternalState. If Setting becomes an 
     //     internal class of a certain class, it can be confusing. I think it would be better 
     //     for Setting(ParseSettings) to be a field or property of the MessageParser.
     //   2.When users use ParseSettings to set parameters, they may need to open the source 
     //     file to see which parameters can be set, and separate file will not interfere with
     //     the user's attention.
     //   3.In future version iterations, maintainers can easily extend the parameters open to users.
     //   4.If we determine to use a separate file, naming ParseSettings may be better than Setting.
     //     ParseSettings will not conflict with other file names that may be set separately in the
     //     future. ParseSettings.cs will be arranged together with ParseContext.cs and ParserInternalState.cs, 
     //     and maintainers can quickly infer the purpose of the ParseSettings.cs file by only seeing its name.
    public sealed class ParseSettings
    {
        public static ParseSettings Default { get; }

        static ParseSettings()
        {
            Default = new ParseSettings();
        }

        public bool DiscardUnknownFields { get; }

        public int SizeLimit { get; } = Int32.MaxValue;

        public int RecursionLimit { get; } = 100;

        private ParseSettings()
        {
        }

        public ParseSettings(bool discardUnknownFields, int sizeLimit, int recursionLimit)
        {
            DiscardUnknownFields = discardUnknownFields;
            SizeLimit = sizeLimit;
            RecursionLimit = recursionLimit;
        }

        public ParseSettings WithDiscardUnknownFields(bool discardUnknownFields) =>
            new ParseSettings(discardUnknownFields, SizeLimit, RecursionLimit);

        public ParseSettings WithSizeLimit(int sizeLimit) =>
            new ParseSettings(DiscardUnknownFields, sizeLimit, RecursionLimit);

        public ParseSettings WithRecursionLimit(int recursionLimit) =>
            new ParseSettings(DiscardUnknownFields, SizeLimit, recursionLimit);
    }

    public class MessageParser
    {
        private readonly Func<IMessage> factory;

        private protected ParseSettings parseSettings = ParseSettings.Default;

        internal MessageParser(Func<IMessage> factory, ParseSettings parseSettings)
        {
            this.factory = factory;
            this.parseSettings = parseSettings;
        }

        public MessageParser WithSettings(ParseSettings parseSettings) => new MessageParser(factory, parseSettings);
    }

    public sealed class MessageParser<T> : MessageParser where T : IMessage<T>
    {
        private readonly Func<T> factory;

        public MessageParser(Func<T> factory) : this(factory, ParseSettings.Default)
        {
        }

        internal MessageParser(Func<T> factory, ParseSettings parseSettings) : base(() => factory(), parseSettings)
        {
            this.factory = factory;
        }

        public new MessageParser<T> WithSettings(ParseSettings parseSettings) =>
            new MessageParser<T>(factory, parseSettings);
    }

    public static class MessageExtensions
    {
        public static void MergeFrom(this IMessage message, byte[] data) =>
            MergeFrom(message, data, ParseSettings.Default);

        [SecuritySafeCritical]
        public static void MergeFrom(this IMessage message, ReadOnlySequence<byte> sequence) =>
            MergeFrom(message, sequence, ParseSettings.Default);

        internal static void MergeFrom(this IMessage message, byte[] data, ParseSettings parseSettings)
        {
            ProtoPreconditions.CheckNotNull(message, nameof(message));
            ProtoPreconditions.CheckNotNull(data, nameof(data));
            CodedInputStream input = new CodedInputStream(data).WithSettings(parseSettings);
            message.MergeFrom(input);
            input.CheckReadEndOfStreamTag();
        }

        [SecuritySafeCritical]
        internal static void MergeFrom(this IMessage message, ReadOnlySequence<byte> data, ParseSettings parseSettings)
        {
            ParseContext.Initialize(data, out ParseContext ctx);
            ctx.DiscardUnknownFields = parseSettings.DiscardUnknownFields;
            ctx.SizeLimit = parseSettings.SizeLimit;
            ctx.RecursionLimit = parseSettings.RecursionLimit;
            ParsingPrimitivesMessages.ReadRawMessage(ref ctx, message);
            ParsingPrimitivesMessages.CheckReadEndOfStreamTag(ref ctx.state);
        }
    }

    [SecuritySafeCritical]
    public sealed class CodedInputStream : IDisposable
    {
        private readonly bool leaveOpen;

        private readonly byte[] buffer;

        private readonly Stream input;

        private ParserInternalState state;

        internal const int BufferSize = 4096;

        #region Construction

        public CodedInputStream(byte[] buffer) : this(null, ProtoPreconditions.CheckNotNull(buffer, "buffer"), 0, buffer.Length, true)
        {            
        }

        public CodedInputStream(byte[] buffer, int offset, int length)
            : this(null, ProtoPreconditions.CheckNotNull(buffer, "buffer"), offset, offset + length, true)
        {            
            if (offset < 0 || offset > buffer.Length)
            {
                throw new ArgumentOutOfRangeException("offset", "Offset must be within the buffer");
            }
            if (length < 0 || offset + length > buffer.Length)
            {
                throw new ArgumentOutOfRangeException("length", "Length must be non-negative and within the buffer");
            }
        }

        public CodedInputStream(Stream input) : this(input, false)
        {
        }

        public CodedInputStream(Stream input, bool leaveOpen)
            : this(ProtoPreconditions.CheckNotNull(input, "input"), new byte[BufferSize], 0, 0, leaveOpen)
        {
        }

        internal CodedInputStream(Stream input, byte[] buffer, int bufferPos, int bufferSize, bool leaveOpen)
        {
            this.input = input;
            this.buffer = buffer;
            this.state.bufferPos = bufferPos;
            this.state.bufferSize = bufferSize;
            SegmentedBufferHelper.Initialize(this, out this.state.segmentedBufferHelper);
            this.leaveOpen = leaveOpen;

            this.state.currentLimit = int.MaxValue;
        }

        #endregion

        // Here we separate the default parameters from the parameters open to users.
        //
        // The default parameters are set by the constructor, and the parameters open to users 
        // use this method, using new CodedInputStream(?).WithSettings(parseSettings) completes all parameter settings.
        // 
        // Setting parameters separately can ensure that the existing code structure does not undergo significant
        // changes and can enhance scalability.
        public CodedInputStream WithSettings(ParseSettings parseSettings)
        {
            state.DiscardUnknownFields = parseSettings.DiscardUnknownFields;

            if (parseSettings.SizeLimit <= 0)
            {
                throw new ArgumentOutOfRangeException("sizeLimit", "Size limit must be positive");
            }

            if (parseSettings.RecursionLimit <= 0)
            {
                throw new ArgumentOutOfRangeException("recursionLimit!", "Recursion limit must be positive"
            }

            this.state.sizeLimit = parseSettings.SizeLimit;
            this.state.recursionLimit = parseSettings.RecursionLimit;
            return this;
        }
    }

    [SecuritySafeCritical]
    public ref struct ParseContext
    {
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        internal static void Initialize(ReadOnlySequence<byte> input, out ParseContext ctx)
        {
            ctx.buffer = default;
            ctx.state = default;
            ctx.state.lastTag = 0;
            ctx.state.recursionLimit = recursionLimit;
            ctx.state.currentLimit = int.MaxValue;
            SegmentedBufferHelper.Initialize(input, out ctx.state.segmentedBufferHelper, out ctx.buffer);
            ctx.state.bufferPos = 0;
            ctx.state.bufferSize = ctx.buffer.Length;
         }

        internal int SizeLimit
        {
            get => state.sizeLimit;
            set => state.sizeLimit = value;
        }

        internal int RecursionLimit
        {
            get => state.recursionLimit;
            set => state.recursionLimit = value;
        }

        internal bool DiscardUnknownFields
        {
            get => state.DiscardUnknownFields;
            set => state.DiscardUnknownFields = value;
        }
    }
jskeet commented 4 months ago

@mulingya: To propose a change, please create a PR rather than posting the code in an issue.

mulingya commented 4 months ago

@mulingya: To propose a change, please create a PR rather than posting the code in an issue.

I have provided an example and hope to participate in the discussion. My proposal is somewhat destructive as it is not compatible with older versions.

jskeet commented 4 months ago

My proposal is somewhat destructive as it is not compatible with older versions.

In that case, I'm afraid it's unlikely that we'd take it before a new major version, which we don't have any plans for at the moment. Backward compatibility is very important in this project.

github-actions[bot] commented 1 month ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.

github-actions[bot] commented 3 weeks ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please reopen it.

This issue was closed and archived because there has been no new activity in the 14 days since the inactive label was added.

EronWright commented 2 weeks ago

After all this, maybe increasing the default recursion limit would be the right thing to do. https://github.com/protocolbuffers/protobuf/pull/17881