Closed JamesNK closed 3 weeks ago
cc @jskeet
That approach sounds reasonable to me - I'll need to run things by the protobuf team, of course.
I do wonder whether SerializationOptions
might be more flexible than SerializationLimits
- I can imagine us wanting to add some options which aren't actually limits, maybe. Would you be okay with that?
I do wonder whether
SerializationOptions
might be more flexible thanSerializationLimits
- I can imagine us wanting to add some options which aren't actually limits, maybe. Would you be okay with that?
I considered a general name. However there are other options/settings such as DiscardUnknownFields and TypeRegistry. A SerializationOptions
would probably also have those properties.
However, those properties are set on a MessageParser rather than on CodedInputStream:
What would the behavior be if a MessageParser has a type registry and the serialization options had a type registry?
To be honest I'm not quite sure why some serialization settings are set one way, and others are set another. I guess the size and recursion limits are related to reading any protobuf, while DiscardUnknownFields and TypeRegistry are about mapping protobuf data to message types.
To be honest I'm not quite sure why some serialization settings are set one way
It's probably a mixture of influences from original Java code and other decisions over time. A lot of this code goes back to 2008 :)
I personally wouldn't mind having serialization options which could be used to override parser options, but this definitely needs a bit more thought.
What should the next steps be here? I can write a PR if it would help create clarity.
I think a PR would be helpful. Thinking about it a bit more over the weekend, I wonder whether:
Those are only thoughts - I'm definitely not trying to set them in stone.
There may be reasons why those options don't work, of course - stuff we'll only find out when we try to implement it. If you have the time to create a sample PR, that would be really useful.
Is there any progress on this matter so far? I think MessageParser.Settings proposed by jskeet is good.
No. You're welcome to implement it.
No. You're welcome to implement it.
I have read a part of the source code of protobuf, and the design of protobuf requires compatibility with previous versions. I am not 100% confident in modifying the source code, and maintaining compatibility with older versions makes me feel nervous.
I have attempted to make changes that only include parameters that I can understand.
// I think ParseSettings should be a separate file. The main reasons are as follows:
// 1.MessageParser.Setting naming is good, but it sounds like an internal class or public
// property of MessageParser. Although it can maintain consistency with JsonParser, from
// the perspective of the user's parameter passing path, these parameters need to go
// through MessageParser, MessageExtensions, CodedInputStream, ParseContext in sequence,
// and finally become the property value of ParserInternalState. If Setting becomes an
// internal class of a certain class, it can be confusing. I think it would be better
// for Setting(ParseSettings) to be a field or property of the MessageParser.
// 2.When users use ParseSettings to set parameters, they may need to open the source
// file to see which parameters can be set, and separate file will not interfere with
// the user's attention.
// 3.In future version iterations, maintainers can easily extend the parameters open to users.
// 4.If we determine to use a separate file, naming ParseSettings may be better than Setting.
// ParseSettings will not conflict with other file names that may be set separately in the
// future. ParseSettings.cs will be arranged together with ParseContext.cs and ParserInternalState.cs,
// and maintainers can quickly infer the purpose of the ParseSettings.cs file by only seeing its name.
public sealed class ParseSettings
{
public static ParseSettings Default { get; }
static ParseSettings()
{
Default = new ParseSettings();
}
public bool DiscardUnknownFields { get; }
public int SizeLimit { get; } = Int32.MaxValue;
public int RecursionLimit { get; } = 100;
private ParseSettings()
{
}
public ParseSettings(bool discardUnknownFields, int sizeLimit, int recursionLimit)
{
DiscardUnknownFields = discardUnknownFields;
SizeLimit = sizeLimit;
RecursionLimit = recursionLimit;
}
public ParseSettings WithDiscardUnknownFields(bool discardUnknownFields) =>
new ParseSettings(discardUnknownFields, SizeLimit, RecursionLimit);
public ParseSettings WithSizeLimit(int sizeLimit) =>
new ParseSettings(DiscardUnknownFields, sizeLimit, RecursionLimit);
public ParseSettings WithRecursionLimit(int recursionLimit) =>
new ParseSettings(DiscardUnknownFields, SizeLimit, recursionLimit);
}
public class MessageParser
{
private readonly Func<IMessage> factory;
private protected ParseSettings parseSettings = ParseSettings.Default;
internal MessageParser(Func<IMessage> factory, ParseSettings parseSettings)
{
this.factory = factory;
this.parseSettings = parseSettings;
}
public MessageParser WithSettings(ParseSettings parseSettings) => new MessageParser(factory, parseSettings);
}
public sealed class MessageParser<T> : MessageParser where T : IMessage<T>
{
private readonly Func<T> factory;
public MessageParser(Func<T> factory) : this(factory, ParseSettings.Default)
{
}
internal MessageParser(Func<T> factory, ParseSettings parseSettings) : base(() => factory(), parseSettings)
{
this.factory = factory;
}
public new MessageParser<T> WithSettings(ParseSettings parseSettings) =>
new MessageParser<T>(factory, parseSettings);
}
public static class MessageExtensions
{
public static void MergeFrom(this IMessage message, byte[] data) =>
MergeFrom(message, data, ParseSettings.Default);
[SecuritySafeCritical]
public static void MergeFrom(this IMessage message, ReadOnlySequence<byte> sequence) =>
MergeFrom(message, sequence, ParseSettings.Default);
internal static void MergeFrom(this IMessage message, byte[] data, ParseSettings parseSettings)
{
ProtoPreconditions.CheckNotNull(message, nameof(message));
ProtoPreconditions.CheckNotNull(data, nameof(data));
CodedInputStream input = new CodedInputStream(data).WithSettings(parseSettings);
message.MergeFrom(input);
input.CheckReadEndOfStreamTag();
}
[SecuritySafeCritical]
internal static void MergeFrom(this IMessage message, ReadOnlySequence<byte> data, ParseSettings parseSettings)
{
ParseContext.Initialize(data, out ParseContext ctx);
ctx.DiscardUnknownFields = parseSettings.DiscardUnknownFields;
ctx.SizeLimit = parseSettings.SizeLimit;
ctx.RecursionLimit = parseSettings.RecursionLimit;
ParsingPrimitivesMessages.ReadRawMessage(ref ctx, message);
ParsingPrimitivesMessages.CheckReadEndOfStreamTag(ref ctx.state);
}
}
[SecuritySafeCritical]
public sealed class CodedInputStream : IDisposable
{
private readonly bool leaveOpen;
private readonly byte[] buffer;
private readonly Stream input;
private ParserInternalState state;
internal const int BufferSize = 4096;
#region Construction
public CodedInputStream(byte[] buffer) : this(null, ProtoPreconditions.CheckNotNull(buffer, "buffer"), 0, buffer.Length, true)
{
}
public CodedInputStream(byte[] buffer, int offset, int length)
: this(null, ProtoPreconditions.CheckNotNull(buffer, "buffer"), offset, offset + length, true)
{
if (offset < 0 || offset > buffer.Length)
{
throw new ArgumentOutOfRangeException("offset", "Offset must be within the buffer");
}
if (length < 0 || offset + length > buffer.Length)
{
throw new ArgumentOutOfRangeException("length", "Length must be non-negative and within the buffer");
}
}
public CodedInputStream(Stream input) : this(input, false)
{
}
public CodedInputStream(Stream input, bool leaveOpen)
: this(ProtoPreconditions.CheckNotNull(input, "input"), new byte[BufferSize], 0, 0, leaveOpen)
{
}
internal CodedInputStream(Stream input, byte[] buffer, int bufferPos, int bufferSize, bool leaveOpen)
{
this.input = input;
this.buffer = buffer;
this.state.bufferPos = bufferPos;
this.state.bufferSize = bufferSize;
SegmentedBufferHelper.Initialize(this, out this.state.segmentedBufferHelper);
this.leaveOpen = leaveOpen;
this.state.currentLimit = int.MaxValue;
}
#endregion
// Here we separate the default parameters from the parameters open to users.
//
// The default parameters are set by the constructor, and the parameters open to users
// use this method, using new CodedInputStream(?).WithSettings(parseSettings) completes all parameter settings.
//
// Setting parameters separately can ensure that the existing code structure does not undergo significant
// changes and can enhance scalability.
public CodedInputStream WithSettings(ParseSettings parseSettings)
{
state.DiscardUnknownFields = parseSettings.DiscardUnknownFields;
if (parseSettings.SizeLimit <= 0)
{
throw new ArgumentOutOfRangeException("sizeLimit", "Size limit must be positive");
}
if (parseSettings.RecursionLimit <= 0)
{
throw new ArgumentOutOfRangeException("recursionLimit!", "Recursion limit must be positive"
}
this.state.sizeLimit = parseSettings.SizeLimit;
this.state.recursionLimit = parseSettings.RecursionLimit;
return this;
}
}
[SecuritySafeCritical]
public ref struct ParseContext
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal static void Initialize(ReadOnlySequence<byte> input, out ParseContext ctx)
{
ctx.buffer = default;
ctx.state = default;
ctx.state.lastTag = 0;
ctx.state.recursionLimit = recursionLimit;
ctx.state.currentLimit = int.MaxValue;
SegmentedBufferHelper.Initialize(input, out ctx.state.segmentedBufferHelper, out ctx.buffer);
ctx.state.bufferPos = 0;
ctx.state.bufferSize = ctx.buffer.Length;
}
internal int SizeLimit
{
get => state.sizeLimit;
set => state.sizeLimit = value;
}
internal int RecursionLimit
{
get => state.recursionLimit;
set => state.recursionLimit = value;
}
internal bool DiscardUnknownFields
{
get => state.DiscardUnknownFields;
set => state.DiscardUnknownFields = value;
}
}
@mulingya: To propose a change, please create a PR rather than posting the code in an issue.
@mulingya: To propose a change, please create a PR rather than posting the code in an issue.
I have provided an example and hope to participate in the discussion. My proposal is somewhat destructive as it is not compatible with older versions.
My proposal is somewhat destructive as it is not compatible with older versions.
In that case, I'm afraid it's unlikely that we'd take it before a new major version, which we don't have any plans for at the moment. Backward compatibility is very important in this project.
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.
This issue is labeled inactive
because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please reopen it.
This issue was closed and archived because there has been no new activity in the 14 days since the inactive
label was added.
After all this, maybe increasing the default recursion limit would be the right thing to do. https://github.com/protocolbuffers/protobuf/pull/17881
What language does this apply to? C#
Describe the problem you are trying to solve.
Many gRPC users are running into the default recursion limit of 100 - https://github.com/grpc/grpc-dotnet/issues/1983
I'm investigating what options we can give them to customize this limit. Using the old serialization APIs you can do this:
However, I noticed that it's not possible to give limits to the new buffer serialization introduced a few years ago. No API is available that allows people to change the limit while still getting its performance benefits.
Describe the solution you'd like
The buffer serialization code respects the limits, the problem is there is just no API that lets you specify them together with ReadOnlySpan/ReadOnlySequence.
I'd like new APIs on
MessageParser
andMessageExtensions
that allow for limits with ReadOnlySpan/ReadOnlySequence:Passing in a class is more flexible as it gives the option for new limits to be added in the future without an overload explosion. However, I'm not against overloads with the two limits is that is preferred, e.g.
I don't think this is difficult, and I'm happy to add a PR. But before that I want to get agreement on API changes.