protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
65k stars 15.41k forks source link

Decimal Type? #4406

Open simplesteph opened 6 years ago

simplesteph commented 6 years ago

It'd be amazing to have a Decimal type as a Well Known Type that can be leveraged across all the languages. I have no preference around the back-end implementation of it, as an end user I just think it'd greatly simplify codebases using Decimals

xfxyjwf commented 6 years ago

By "Decimal", I guess you mean something like java.math.BigDecimal in Java that can represent arbitrary precision. I think we should only introduce it if such a decimal type is supported as a built-in type or standard library type by all programming languages that we officially support. That probably won't happen any time soon though.

simplesteph commented 6 years ago

Yes exactly !

bric3 commented 6 years ago

The java implementation could understand how to serialize BigDecimal or BigInteger to/from strings. I think this is a missed opportunity. Without it this causes developers to use floating pointing types instead, and use these type for calculation which may be bad. Leveraging protobuf to promote these types would help to use consistent and safe way to handle precision numbers.

bugproof commented 5 years ago

I'm currently struggling with it myself (I'm using C#). I think you can't avoid having to map(convert) to decimal manually everytime you want to use decimal.

Some people pointed to https://github.com/googleapis/googleapis/blob/master/google/type/money.proto but there's no information on how to convert from nanos and to nanos.

And it gives you the same result as just storing it as a string.

I wanted to rewrite my code to use gRPC but the state of things with decimal is really putting me off.

TeBoring commented 5 years ago

Maybe a util method from protobuf's supported type (e.g., string) to the decimal type in the specific language?

simplesteph commented 5 years ago

@TeBoring that'd be the easier one. I'd rather have this than Avro's bytes method, as it destroys the user experience, but I'm sure their choices would be important to understand https://avro.apache.org/docs/1.8.1/spec.html#Decimal https://issues.apache.org/jira/browse/AVRO-1402

TeBoring commented 5 years ago

@anandolee, this seems to be c# specific. Could you follow on that?

tephe commented 5 years ago

We are also using decimals quite heavily in our application and this is currently an issue for the migration to protobuf. Is there any update on this?

mrunesson commented 5 years ago

@TeBoring I would say this is not C# specific. We have:

GrigorievNick commented 5 years ago

https://avro.apache.org/docs/1.8.2/spec.html#Decimal When we talk about 'Decimal' most of us mean type that can be keep 'precision' and 'scale' value for float values. Avro is good example, they use unify serialisations of decimal to Bytes + keep two additional properties (scale, precision). Why its important: All use same serialisations process Previously some serialised it to string, some to Byte Array with Ascending Order, some With Descending, etc Now every one use same bytes, with same order, and keep metadata in same fields.

maurodelazeri commented 5 years ago

decimal would be very handy, I'm currently converting to a byte array base-256 digit, I guess is the most efficient way

mikev commented 4 years ago

"While suitable for many purposes, binary floating-point arithmetic should not be used for financial, commercial, and user-centric applications"

Yes, Google we need a Decimal type! This is an example of the issue and probably demonstrates the issue in any language: bool result = ((0.1 + 0.2) == 0.3); // result is false

The following link provides a detailed explanation and examples to explain the motivation for the "Decimal" data type, which is available in Python, Java and C#. http://speleotrove.com/decimal/

jtattermusch commented 4 years ago

Update: I'm currently driving internal proposal for the "decimal" type and I published the proposal document externally, so that people can chime in as needed.

See https://github.com/protocolbuffers/protobuf/pull/7039

ritaly commented 4 years ago

There's class BigDecimal in Ruby. So strong +1 to add Decimal to protobuf types đź‘Ť

I'd get rid of workaround for Google::Protobuf::TypeError: Expected number type for double field (given BigDecimal).

vivainio commented 4 years ago

Decimals are exceptionally popular in C#, especially with database-backed business software. Almost all the "number" types in our application (and there is a lot of them) are decimals, there are only few floats here and there. Since float is not a safe replacement, we would end up mapping decimals to strings which is sort of embarrassing (and unsafe since not every string is a valid decimal).

Since it's a missing fundamental type, it should be in WellKnownTypes (as the suggested FixedDecimal). Alternatively, there should be another source of these "well known types" to avoid everyone copy-pasting the FixedDecimal to their codebases (in different namespaces)

LiamMorrow commented 4 years ago

This doesn't solve this in the WellKnownTypes land, but here we use:

message BigDecimal {
  string value = 1; // feel free to substitute for a variant of precision/scale/unscaled
}

Then in the csharp land we can define some implicit/explicit casting operators or conversion methods via extension of the partial class:

public partial class BigDecimal
{
    public static implicit operator BigDecimal(decimal value)
    {
        return new BigDecimal
        {
            Value = value.ToString(CultureInfo.InvariantCulture),
        };
    }

    public static implicit operator decimal(BigDecimal value)
    {
        return decimal.Parse(value.Value, CultureInfo.InvariantCulture);
    }
}
csmorley commented 4 years ago

Are there any preview ports of this? Writing a finance app with millions of messages per second, non-native support and casting each time is a non-starter for gRPC currently without this

jufemaiz commented 4 years ago

For those working with decimals, can I confirm that there are two approaches that we are left with while we await movement on #7039 :

  1. Use of string marshalling for decimals
  2. Use of a custom protobuf message with either units and nanos (ala Google's Money.proto example) or an coefficient, exponent and sign (ala big.js)

Then hand off the expected marshalling of such messages to internal implementations of each application.

If this is the case, can people chime in with their preferred approach to date?

csmorley commented 4 years ago

For those working with decimals, can I confirm that there are two approaches that we are left with while we await movement on #7039 :

  1. Use of string marshalling for decimals
  2. Use of a custom protobuf message with either units and nanos (ala Google's Money.proto example) or an coefficient, exponent and sign (ala big.js)

Then hand off the expected marshalling of such messages to internal implementations of each application.

If this is the case, can people chime in with their preferred approach to date?

Many thanks! Personally, would prefer custom profobuf message (this is what i have been doing to date). If there are any one-time/optimised marshalling that can be implemented as part of the codegen to prevent repeated casting that would be my main concern

mtrtm commented 4 years ago

There's a reference to this and a workaround here for wcf/recode, but I can't believe this is still an open issue https://visualrecode.com/blog/csharp-decimals-in-grpc/

D3MaxT commented 3 years ago

Likewise. Decimal is an absolutely critical type for handling money in C#. Please add this to WellKnownTypes.

Thank you!

vivainio commented 3 years ago

Now that Google killed https://github.com/protocolbuffers/protobuf/pull/7039, what is going to happen with this ticket and/or C# decimal support?

fowles commented 3 years ago

My intention is to leave this ticket open. The desire is not invalid, it is simply the case that we are not actively pursuing it.

usarshakya commented 3 years ago

Any update on this? Please add 'Decimal' to WellKnownTypes

jufemaiz commented 3 years ago

The signal is strongly (see @vivainio's comment above) that Google's team has placed any updates to support a decimal type on ice.

In the meantime, projects will need to DIY support for fixed decimal types.

JasonPolisAdmin commented 2 years ago

Need support for Decimal128, so new fixed width wire type.

Workaround is to use with additional validation, but would prefer native scalar Decimal128.

Big Decimal can also be represented with bytes, or else string.

wrkntwrkn commented 2 years ago

Any progress on this?

fowles commented 2 years ago

As noted above, we have no intention of pursuing this in the near term. The language specific aspects of this make it a very large amount of effort to implement across our entire surface. I recommend that users use a string field with appropriate parsing in their own code.

mikev commented 2 years ago

As noted above, we have no intention of pursuing this in the near term. The language specific aspects of this make it a very large amount of effort to implement across our entire surface. I recommend that users use a string field with appropriate parsing in their own code.

The decimal data type is not a language specific type. It is an IEEE standard -https://en.wikipedia.org/wiki/IEEE_754#Decimal In particular it is extensively used in SQL storage systems. The recommendation to use a string misunderstands the problem. Decimal is a bitwise encoding therefore one cannot transform from a string field to a correctly encoded decimal type.

Probably what needs to happen is for somebody to create a PR showing Google exactly how to implement a solution, since Google engineers appear to consistently misunderstand this issue.

fowles commented 2 years ago

While you are welcome to create a PR, the odds that it is adopted are low. As mentioned, any sort of increased precision datatype needs to be handled in the full set of languages supported consistently which makes this a substantially larger amount of effort then most anticipate. For every primitive datatype in protobuf that type must be reified into some language specific type for each target language.

Decimal as you have pointed to is a general encoding of a particular format for storing arbitrary precision numbers. It can in fact be stored as a string field (or rather a bytes field).

haberman commented 2 years ago

The decimal data type is not a language specific type. It is an IEEE standard

The IEEE standard is one way of representing decimals, but it is unrelated to the arbitrary precision decimals that most contributors to this issue have been asking about (BigDecimal in Java, decimal in Python, BigDecimal in Ruby, and Decimal in Avro).

The C# Decimal type is more similar to decimal128 from IEEE 754-2008, as it is a fixed-size 128-bit representation. However the C# type does not use the IEEE format, as it appears to predate it by several years.

Decimal128 has the attractive property of being standardized, but adoption is low: even C and C++ have not adopted it into their standard libraries (though it looks like it may be coming in C23).

Different languages ecosystems have different expectations about what a Decimal type would provide. Bridging this gap, while maintaining reasonable efficiency, data integrity, interoperability, and convenience, makes this issue more difficult to solve than it may first appear.

mikev commented 2 years ago

The real world example I demo'd above shows why this is a problem for programmers:

bool result = ((0.1 + 0.2) == 0.3); // result is false

At my previous job we implemented a fully working gRPC custom data type solution for C#. The real-world problem this issue causes is that financial computations end up being not equal after transport over a grpc link. Developers then spend hours or days trying to track down the issue, because unit tests often expect exact results and outside reviewers tend to legitimately worry that errors exist in the financial computations, especially since the computation differences can appear to be signficant.

Perhaps Google could "solve" this problem by providing open source custom gRPC types which allow developers to transport the Decimals types over grpc in a language specific fashion. Usually the real-world issues involve the same language environment on both sides. Nobody is expecting transport from C# to a C++ system. But transport between systems which do support Decimal such as Java and C# would be nice.

My suggestions: just show us a work-around custom type implementation.

jufemaiz commented 2 years ago

most contributors to this issue have been asking about (BigDecimal in Java, decimal in Python, BigDecimal in Ruby, and Decimal in Avro).

To be clear I think it would be worthwhile having a well known type that involves support for native decimal where it exists (and yes the flavour may vary with the various languages) and allows for custom marshalling where it doesn’t. Whether that is something that is feasible is a different question.

If the official position for decimals is DIY then I personally believe that this should be noted explicitly in the docs, with or without sample implementations, rather than people having to find these threads for what is in many domains a required and very standard data type.

perezd commented 2 years ago

At my previous job we implemented a fully working gRPC custom data type solution for C#.

I encourage you to open source this, and we can point people to it.

Nobody is expecting transport from C# to a C++ system.

With respect, this is something that you don't care about, but if we support it in the core it is something we do have to care about.

mikev commented 2 years ago

At my previous job we implemented a fully working gRPC custom data type solution for C#.

I encourage you to open source this, and we can point people to it.

Nobody is expecting transport from C# to a C++ system.

With respect, this is something that you don't care about, but if we support it in the core it is something we do have to care about.

Unfortunately I no longer have access to that source code.

Yes we would all expect Google to concern itself with transport from C# to C++. In that case, because C++ does not support a native IEEE 754-2008 type, my suggestion is to coerce the transported 32-bits to a C++ float type. Obviously this would involve a small loss of precision.

Another example:

Using a native float, five cups of coffee at $4.99 a cup will cost $24.949999, not the expected $24.95.

The advantage of a Google open source implementation is that Google has the resources to build a cross-language custom data type. Most of us don't have the time or resources to build a complete solution. Also there are lots of implementation choices and nuances which make any solution non-trivial.

GrigorievNick commented 2 years ago

While I personally think that complex type must be part of the specification, and even migrate most of my application to Avro.

If the official position for decimals is DIY then I personally believe that this should be noted explicitly in the docs, with or > without sample implementations, rather than people having to find these threads for what is in many domains a required > and very standard data type.

But for everyone who asks about custom complex type - here is Microsoft .Net example fro C# And you can do your own marshaling to any language.

jufemaiz commented 2 years ago

FTR our internal implementation for Go, Ruby and JS is using this, which in turn is based on Google’s own money type.

https://github.com/googleapis/api-common-protos/blob/main/google/type/money.proto


// Represents a fixed decimal value.
message FixedDecimal {
  // Whole units part of the value.
  sint64 units = 1;
  // Nano units of the amount (10^-9)
  // Must be same sign as units
  // Example: The value -1.25 is represented as units=-1 and nanos=-250000000
  sfixed32 nanos = 2;
}
JasonPolisAdmin commented 2 years ago

It would be nice to have a 128-bit wire type to represent decimal128, UUIDs, timestamps and IPv6 addresses as a single field. Otherwise the choice is between using 4 x 32 bit, 2 x 64 bit or 1 x bytes.

If we use a single field, whose length determines whether it's decimal 32, 64 or 128:

message Decimal {
    oneof encoding {
        string big = 1; // Arbitrary length string using only characters 0-9
        bytes bid = 2; // Binary integer significand field
        bytes dpd = 3; // Densely packed decimal significand field
        // ... other representations
    } // encoding = choice of representation of semantic type
} // Decimal = semantic type

We could also use bytes only for decimal128, and used fixed32 for decimal32 and fixed64 for decimal64:

message DenselyPackedDecimal {
    oneof encoding {
        fixed32 fixed32 = 1; // 32 bit Densely packed decimal significand field
        fixed64 fixed64 = 2; // 64 bit Densely packed decimal significand field
        bytes bytes = 3; // bytes for 128 bit Densely packed decimal significand field
    } // encoding type
} // Densely Packed Decimal = encoding style

When we reach consensus on a library of common decimal representations in this thread, where should we host it ?

perezd commented 2 years ago

The advantage of a Google open source implementation is that Google has the resources to build a cross-language custom data type. Most of us don't have the time or resources to build a complete solution. Also there are lots of implementation choices and nuances which make any solution non-trivial.

There's a lot of assumptions built into this statement that we have resources we actually don't have to do anything we want. I think many on this thread should know what it's like to work on software projects for large orgs, especially in open source. We have to be extremely mindful about what we take on as core to the project because of the breadth and depth requirements we have to support, as you've identified.

Also, there's a lot of cost to adding new custom data types that are non-obvious to our customers that we have to balance, aside from supporting it cross-lang. There are also wire format implications that the uninitiated may not anticipate as a hindrance.

fowles commented 2 years ago

This is simply a question of priority. The cost of doing this work is fairly high and the value of it is not sufficient to justify that cost. I honestly do not think a team outside of Google will be able to present a business case to us that would lead to a change in this decision.

As I said above, the desire is not invalid, it is simply the case that we are not actively pursuing it and don't have any medium term plans to.

ericsampson commented 2 years ago

thanks for responding Matt! To keep this more positive, as a general question what is required for a community contribution towards creating a new WKT? Is an agreed-upon spec sufficient, or is the bar that the PR(s) would have to cover every supported language? The closed #7039 mentions "It's fine for language implementations to provide the bindings independent of other languages (e.g. C# might provide the bindings sooner than some other language)." Cheers

fowles commented 2 years ago

There are a great many customers of protobuf, many of whom do not care in any direction whatsoever about this feature. The statement "Everybody wants this" is false. In fact, not a single team internal to Google has asked for this feature.

While universal support is not a requirement out of the gate, I would want support for C++, java, and upb to be committed to before accepting any part of this. It would be preferable for other languages to have plans to follow on top of that, but it is not an absolute requirement.

When I say "I honestly do not think a team outside of Google", I am talking about the realities of our business needs and prioritization. We are a smaller team then you likely guess and so we must pick and choose our engagements so that we can yield the maximum possible impact and reduce our total cost of maintenance. Within Google we have mechanisms for escalating when one team's choices don't line up with another team's needs. Externally, the ability to present that business case is greatly reduced.

To be clear, the lack of prioritization of this is intentional.

ericsampson commented 2 years ago

Thank you Matt for clearly laying out the requirements needed in order for a community PR to be considered for consideration, I appreciate that.

LFXiaoFei commented 2 years ago

I need it so much.

haberman commented 2 years ago

Please be aware that a key difficulty -- if not the key difficulty -- is deciding which decimal representation to use. There are several possibilities, as described here: https://github.com/protocolbuffers/protobuf/pull/7039#issuecomment-679083593

This decision has deep performance and interoperability implications. There are many decimal representations in common use, and no clear winner. Making a decision here will require carefully weighing the tradeoffs and deeply analyzing the pros vs. cons of each option. This may require performance experiments so we know the cost of converting each option to other commonly-used representations, as well as analyzing how to handle cases where the conversion cannot be performed without losing precision or accuracy.

haberman commented 2 years ago

And since this issue can be solved by simply allowing the software architecture used to inject and version the format used

We were evaluating five options, if we consider this one also it will be six. This idea is not a panacea that will solve all problems, it is an alternative that has its own set of pros and cons.

It really just sounds like the protocol buffer team is refusing to collect data because they are scared of data.

There are 945 open issues on this repo alone. That is just the OSS side, and says nothing of requests we are receiving internally. Please consider that we might be busy working on some of those.

If folks are interested in good-faith exchange on this subject please feel free to ask questions or contribute technical feedback. I will not engage further with this sort of rhetoric.

dantheperson commented 1 year ago

Seems google have delivered a decimal. https://github.com/googleapis/googleapis/blob/master/google/type/decimal.proto If you are using gRPC then it's probably already in your dependency tree

jufemaiz commented 1 year ago

It's a proto type, but does not manage the transformation to a golang specific implementation of a decimal. That still requires implementation to transform the encapsulated string value to a working decimal type.

dantheperson commented 1 year ago

Would something be possible using the code generator plugin insertion points, to do the conversion to the native type in the generated code, and save having each an every use of the type needing to convert it in application code?

On Wed, 14 Jun 2023 at 14:27, Joel Courtney @.***> wrote:

It's a proto type, but does not manage the transformation to a golang specific implementation of a decimal. That still requires implementation to transform the encapsulated string value to a working decimal type.

— Reply to this email directly, view it on GitHub https://github.com/protocolbuffers/protobuf/issues/4406#issuecomment-1590347009, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGIY6ZUOF33VQLNOLIZ3CLXLEOSRANCNFSM4EWVH4RQ . You are receiving this because you commented.Message ID: @.***>

-- Sent from my computer

cvigo commented 1 year ago

Seems google have delivered a decimal. https://github.com/googleapis/googleapis/blob/master/google/type/decimal.proto If you are using gRPC then it's probably already in your dependency tree

More tan 2 years ago…