scalapb / ScalaPB

Protocol buffer compiler for Scala.
https://scalapb.github.io/
Apache License 2.0
1.3k stars 280 forks source link

Problem with custom types that have invariances #177

Closed nlochschmidt closed 7 years ago

nlochschmidt commented 7 years ago

Given this proto definition:

syntax = "proto2";

package example;

import "scalapb/scalapb.proto";

message Event {
    required string eventId = 1 [(scalapb.field).type = "example.EventId"];
}

message Deposit {
    required int32 amount = 1 [(scalapb.field).type = "example.PosMoneyAmount"];
}

and these TypeMappers

case class EventId(id: UUID)

object EventId {
  implicit val typeMapper = TypeMapper[String, EventId]{str => apply(UUID.fromString(str))}{_.id.toString}
}

case class PosMoneyAmount(amount: Int) {
  require(amount > 0, "PosMoneyAmounts must be positive")
}

object PosMoneyAmount {
  implicit val typeMapper = TypeMapper(apply(_: Int))(_.amount)
}

I expected these tests to work:

behavior of "Custom Type serialization"

it must "serialize and deserialize Event" in {
    val originalEvent = Event(EventId(UUID.randomUUID()))
    val deserializedEvent = Event.parseFrom(originalEvent.toByteArray)

    deserializedEvent mustBe originalEvent
}

it must "serialize and deserialize Deposit" in {
    val originalDeposit = Deposit(PosMoneyAmount(10))
    val deserializedDeposit = Deposit.parseFrom(originalDeposit.toByteArray)

    deserializedDeposit mustBe originalDeposit
}

Instead I get this output:

[info] DataSerializationTest:
[info] Custom Type serialization
[info] - must serialize and deserialize Event *** FAILED ***
[info]   java.lang.IllegalArgumentException: Invalid UUID string:
[info]   at java.util.UUID.fromString(UUID.java:194)
[info]   at example.EventId$$anonfun$1.apply(EventId.scala:10)
[info]   at example.EventId$$anonfun$1.apply(EventId.scala:10)
[info]   at com.trueaccord.scalapb.TypeMapper$$anon$1.toCustom(TypeMapper.scala:24)
[info]   at example.Event.Event$.defaultInstance$lzycompute(Event.scala:71)
[info]   at example.Event.Event$.defaultInstance(Event.scala:70)
[info]   at example.Event.Event$.defaultInstance(Event.scala:58)
[info]   at com.trueaccord.scalapb.LiteParser$.parseFrom(LiteParser.scala:9)
[info]   at com.trueaccord.scalapb.GeneratedMessageCompanion$class.parseFrom(GeneratedMessageCompanion.scala:103)
[info]   at example.Event.Event$.parseFrom(Event.scala:58)
[info]   ...
[info] - must serialize and deserialize Deposit *** FAILED ***
[info]   java.lang.IllegalArgumentException: requirement failed: PosMoneyAmounts must be positive
[info]   at scala.Predef$.require(Predef.scala:224)
[info]   at example.PosMoneyAmount.<init>(EventId.scala:14)
[info]   at example.PosMoneyAmount$$anonfun$4.apply(EventId.scala:18)
[info]   at example.PosMoneyAmount$$anonfun$4.apply(EventId.scala:18)
[info]   at com.trueaccord.scalapb.TypeMapper$$anon$1.toCustom(TypeMapper.scala:24)
[info]   at example.Event.Deposit$.defaultInstance$lzycompute(Deposit.scala:71)
[info]   at example.Event.Deposit$.defaultInstance(Deposit.scala:70)
[info]   at example.Event.Deposit$.defaultInstance(Deposit.scala:58)
[info]   at com.trueaccord.scalapb.LiteParser$.parseFrom(LiteParser.scala:9)
[info]   at com.trueaccord.scalapb.GeneratedMessageCompanion$class.parseFrom(GeneratedMessageCompanion.scala:103)

Full example project at https://github.com/nlochschmidt/ScalaPB-CustomTypesBug

It looks like the issue originates in the defaultInstance method which seems to try and create a UUID from an empty string and a PosMoneyAmount from 0.

Is this a known limitation of using custom types in ScalaPB?

thesamet commented 7 years ago

Yes, known limitation - the TypeMapper function need to be defined for default values (zeros, empty strings) so we can generate a defaultInstance. For these two specific examples, I would suggest using optional fields instead of required fields. For PosInteger - I'd avoid mixing validation logic with the (de)serialization logic. For the UUID, you can represent invalid UUIDs as a special value like new UUID(0,0)

ahjohannessen commented 7 years ago

@nlochschmidt A bit curious regarding your PosMoneyAmount. Does it represent pennies?

In some of our projects we have a type case class Money(pennies: Int) which is used with validated positive amounts. In order to represent negative amounts we use another type case class MoneyDelta(amount: Money, sign: Sign) where Sign is one of Positive, Negative and Zero.

nlochschmidt commented 7 years ago

@ahjohannessen Actually, this is just a simplified example. In our application, we are having a SavingsRate class which contains a custom MonetaryAmount which itself contains a BigDecimal value and a currency. Since it doesn't make sense for a savings rate to be negative and since we in addition only allow Euro values as well as integer values for the savings rate right now we have these constraints in our code.

nlochschmidt commented 7 years ago

@thesamet I agree that it is probably better to avoid mixing up validation with serialization logic, however I would really like to know why it's necessary to be able to create a defaultInstance in the first place.

thesamet commented 7 years ago

@nlochschmidt Say we have:

message A { ...things... }

message B { optional A a = 1; }

In the generated code, we want to be able to do B().getA which would return the default instance of A in the case where a is unset.

nlochschmidt commented 7 years ago

@thesamet Ok, I didn't see those getX methods yet, because I actually only used protobuf version 2 and only required fields. However, that design decision sounds extremely dangerous (as in bug inducing) to me. I would not expect that there exists a method like that but if it does I would really expect it to fail or return null if the value is not set in the message. How is this supposed to be used? I guess you always have to guard it with some hasX function, which does not seem to exist and which will probably defeat the whole purpose.

thesamet commented 7 years ago

@nlochschmidt This is the same programming API that is available in the official Java/C++/Python code generators for protocol buffers. For example, see the "optional" section in https://developers.google.com/protocol-buffers/docs/javatutorial:

For embedded messages, the default value is always the "default instance" or "prototype" of the message, which has none of its fields set.

In ScalaPB, optional fields are implemented as Option[T] values. I understand the expectation that getX should behave as x.get and would throw if the value is unset, though the choice here has been consistency with Java and other implementations. The hasX guard in ScalaPB is just x.isDefined

An example where this is useful if you have nested optional messages and you want to check if some optional primitive is defined: x.getA.getB.getC.getD.e.isDefined" - this code would work when A, B, C, D are unset. This is a very common situation and achieving this without getgetXfunction ends up very verbose (either a bunch ofisDefined` checks, or a nested series of flatMaps or a for-loop).

Regarding use of required field - see the "Optional is forever" note on the link above. Required fields can hinder schema evolution, and optional fields should be preferred. In that case, the presence validation check should be in the application layer (instead of in the deserializer).

nlochschmidt commented 7 years ago

@thesamet I didn't know the part about the default instance from the java tutorial and from that standpoint it makes total sense to support the standard protobuf semantics instead of deviating from them. Thanks for the detailed explanation 👍

aaabramov commented 3 years ago

I've managed to workaround this by using protected val + def:

syntax = "proto3";

package com.aaabramov.example;

import "scalapb/scalapb.proto";

message Deposit {
  option (scalapb.message).extends = "com.aaabramov.example.DepositExt";
  option (scalapb.message).companion_extends = "com.aaabramov.example.DepositCompanionExt";

  string rawAmount = 1 [(scalapb.field).annotations = 'protected val'];
}
package com.aaabramov.example

trait DepositExt { this: Deposit =>
  def amount: PosMoneyAmount = PosMoneyAmount(rawAmount)
}

trait DepositCompanionExt {
  final def apply(amount: PosMoneyAmount): Deposit =
    Deposit(amount.amount)
}
thesamet commented 3 years ago

I believe the original problem reported on this issue has been resolved in ScalaPB 0.10.10 and 0.11.x - custom types do not require to be defined for empty values. The parsing code no longer relies on defaultInstance.

aludwiko commented 3 years ago

@thesamet I'm not sure about the fix in 0.11.x, after upgrading to 0.11.4 I'm still facing the same error, where defaultInstance is used for serialization:

[info] java.lang.NumberFormatException
[info]  at java.base/java.math.BigDecimal.<init>(BigDecimal.java:625)
[info]  at java.base/java.math.BigDecimal.<init>(BigDecimal.java:402)
[info]  at java.base/java.math.BigDecimal.<init>(BigDecimal.java:835)
[info]  at scala.math.BigDecimal$.exact(BigDecimal.scala:121)
[info]  at scala.math.BigDecimal$.apply(BigDecimal.scala:244)
[info]  at ***.BigDecimalTypeMapper$.$anonfun$stingToBigDecimalMapper$1(BigDecimalTypeMapper.scala:7)
[info]  at scalapb.TypeMapper$$anon$1.toCustom(TypeMapper.scala:26)
[info]  at ***.Money$.defaultInstance$lzycompute(CsBaseProto.scala:1835)
[info]  at ***.Money$.defaultInstance(CsBaseProto.scala:1834)
[info]  at ***.BillingInfo.__computeSerializedValue(CsBaseProto.scala:5218)
[info]  at ***.BillingInfo.serializedSize(CsBaseProto.scala:5235)
thesamet commented 3 years ago

@aludwiko - can you double check if you regenerated the code after the ScalaPB upgrade (sbt clean may be necessary). If this persists, can you create an example project that demonstrates the issue and share it on github? You can use the ScalaPB sbt template to quickly generate a minimal project:

sbt new scalapb/scalapb-template.g8
aludwiko commented 3 years ago

@thesamet Actually it's quite easy to reproduce it, just run the Main from: https://github.com/aludwiko/scalapb-serialization

thesamet commented 3 years ago

Thanks for putting together the example. It looks like this issue can be reproduced only when no_box is also used - let's track this in #1198.