spartanz / schemaz

A purely-functional library for defining type-safe schemas for algebraic data types, providing free generators, SQL queries, JSON codecs, binary codecs, and migration from this schema definition
https://spartanz.github.io/schemaz
Apache License 2.0
164 stars 18 forks source link

Json #24

Closed GrafBlutwurst closed 5 years ago

GrafBlutwurst commented 5 years ago

Now open to public critique

There's a couple thing's I am not happy about. Mainly the necessity of M[_] in the serialization this relates to #10. The other point is that I duplicated the Person Schema. I wasn't quite sure where to put it without messing up dependencies / cleanliness in tests.

GrafBlutwurst commented 5 years ago

For completeness sake this addresses #18

jkobejs commented 5 years ago

Regarding your concerns about MonnadError I think that it would be better that we use Applicative here if we don't manage to improve encoding of sum types. Reason is that I would like to accumulate errors like in Validation when Union definitions miss branches on more levels of schema.

Regarding schema duplication in tests we can create

trait PersonTestModule extends SchemaModule {
  type Prim[A]       = JsonSchema.Prim[A]
  type ProductTermId = String
  type SumTermId     = String

  type PersonTuple = (Seq[Char], Option[Role])

  val personSchema = iso[Person, PersonTuple](
    record[Person](
      ^(
        essentialField[Person, String](
          "name",
          prim(JsonSchema.JsonString),
          Person.name,
          None
        ),
        nonEssentialField[Person, Role](
          "role",
          union[Role](
            branch(
              "user",
              record[User](
                essentialField(
                  "active",
                  prim(JsonSchema.JsonBool),
                  Person.active,
                  None
                ).map(User.apply)
              ),
              Person.user
            ),
            branch(
              "admin",
              record[Admin](
                essentialField(
                  "rights",
                  seq(prim(JsonSchema.JsonString)),
                  Person.rights,
                  None
                ).map(Admin.apply)
              ),
              Person.admin
            )
          ),
          Person.role
        )
      )(Person.apply)
    ),
    Person.personToTupleIso
  )
}

for whole test module and then extend it or directly instantiate it in our test sections. Note that we cannot define this trait in test-commons because we would need to depend on core module and core module already depends on test-commons. We need to define this trait per test module. If you have idea how to share it between all modules that would be great. I’ll try to come up with something during weekend.

GrafBlutwurst commented 5 years ago

@josipgrgurica Regarding Applicative vs MonadError is there a way to abstract over this. I can imagine cases where I don't want to evaluate all errors for performance reason and reject an entry based on the first error alone. Also doesn't it need something like ApplicativeError I know it from cats but can't find it in sclaz. As for the deduplication in core tests. maybe a good old util package with some static test data like the Person schema?

jkobejs commented 5 years ago

@GrafBlutwurst You are right, it would be good if we stuck with this to have ability to chose whether fail fast or accumulate error. I also haven't find ApplicativeError in scalaz. @vil1 do you have any insights about it?

Regarding tests, can you create an example so that we can discuss it?

vil1 commented 5 years ago

Well, as @jdegoes nicely summarised in this tweet applicatives and monads are both semigroups in a higher category (in respect with different "products"). So I suppose that, at least in theory, we could abstract over that higher semigroup and let the user decide wether they want the applicative or the monadic way of combining intermediate results. But I don't know scalaz well enough to say if there is an encoding for that higher semigroup.

That being said, the only reason I see for preferring "fail fast" instead of "accumulate all errors" for performance when dealing with big pieces of data. So I think we should focus on the applicative way first, and later try to allow the monadic one as an optimisation.

Also, upon reflection I don't think returning A => M[JSON] with an implicit MonadError constraint on M is enough to remind our users that this operation can fail. I'd rather have a plain old Either or a scalaz.\/ or a scalaz.Validation* in the signature.

So I think we should go for A => ValidationNel[ToJsonError, JSON]. ValidationNel[X, Y] has a Monoid instance if Y has an instance of Monoid too, so I suppose we could handle record fields with something like :

case schemaModule.Schema.RecordSchema(fields) =>
      a =>
        fields.analyze(new Field[A, ?] ~> λ[α => ValidationNel[ToJsonError, IList[JSON]]]) {
          def apply[X](field: Field[A, X]): ValidationNel[ToJsonError, IList[JSON]] = ??? 
        })
        .map(_.toList.mkString("{", ",", "}")

The analyze method on FreeAp traverses the data structure, applies a polymorphic function (aka natural transformation) to each element and accumulates the result is a monoid.

The bizarre λ[α => ValidationNel[ToJsonError, IList[JSON]]] just makes a type constructor that doesn't use its parameter out of ValidationNel[ToJsonError, IList[JSON]] (so that we can use it as the target of our natural transformation).

Sorry for the wall of text (and for the time it took me to post it)

vil1 commented 5 years ago

[...] Note that we cannot define this trait in test-commons because we would need to depend on core module and core module already depends on test-commons. We need to define this trait per test module. If you have idea how to share it between all modules that would be great [...]

After all, why don't we write all tests in a single module (that would depend on all other modules), with a single main class and so on?

GrafBlutwurst commented 5 years ago

I would prefer to stay with A => M[JSON] my reasoning being that it avoids unnecessary rewrapping if the user is already working e.g in some IO since it'd play nicely with flatMap. I'd also actually stay with the MonadError on further consideration because there's only exactly one type of error that can happen, this being the case of an unknown branch. This would mean the Schema for A was constructed wrong and I don't think further processing makes any sense. Further this should become impossible down the line due to #10 if I understand correctly.

Though I'll happily change it as @vil1 if overruled.

λ[α => ValidationNel[ToJsonError, IList[JSON]] this is much more elegant than my usage of Const thanks!

As for the tests: Is it fine if I open it in a new issue (and if nobody else wants it I'll gladly take it) as I don't want to go down a refactoring rabbithole for this PR?

vil1 commented 5 years ago

Though I'll happily change it as @vil1 if overruled.

Don't expect an anarchist to rule anything ;) But you can count on me to argue ^^

I would prefer to stay with A => M[JSON] my reasoning being that it avoids unnecessary rewrapping if the user is already working e.g in some IO since it'd play nicely with flatMap

I'm not convinced by this argument. if we return A => ValidationNel[ToJsonError, JSON] the user can still use map with the returned function. And if they really need to use flatMap they just need to compose the returned function with their M.pure.

I'd also actually stay with the MonadError on further consideration because there's only exactly one type of error that can happen, this being the case of an unknown branch.

This is right, but the user's program might need to raise other types of errors we don't know about, and they'd need to emap to fit our ToJsonError into their larger error type.

But my main argument is that using a monadic (i.e. fail fast) error handling by default probably isn't what our users would expect (that's at least not something I would expect from a library). Errors will occur whenever the datum to be serialized doesn't conform to the schema being used. In such case, having to retry at least N+1 times to know and fix N errors is something I find infuriating.

Let me insist on that last point. Errors will occur whenever the datum to be serialized doesn't conform to the schema being used. Please keep in mind that we want to be able to handle various kinds of schema "realisations" (i.e. the way data is laid-out in memory): Scala classes, JSON, Avro, etc. We can probably make erroneous states not representable (to a certain extent) in the Scala case, but we cannot in the general case.

So from a user perspective, I think it's very important to make sure we report errors clearly and completely, hence my preference for the applicative way.

I think we should further discuss error handling in general on the gitter channel.

As for tests, I agree it can be separated from this PR, I am working on it right now.