Exclude params and cons from the derivation based on annotations

vigoo commented 3 years ago

I should have opened an issue about this originally instead of discussing it immediately in a proposed implementation (https://github.com/softwaremill/magnolia/pull/270) so I wanted to move it here now to restart discussion.

The problem

The real-world example for the requested feature came up while trying to migrate the semi-auto derived binary codecs in https://github.com/vigoo/desert/ from shapeless to magnolia. It supports marking case class fields and sealed trait constructors as transient. These transient subtrees of a data type may have types for which there is no implicit binary codec available and it cannot be automatically derived either. Even though the transient annotations can be observed from magnolia's combine/dispatch methods, as the derivation engine does not know about these application-specific exclusion rules it tries to gather typeclass instances for these members too, and fails.

An example for using these transient annotations is when serializing Akka-typed messages for remoting but using a subset of the message type for local-only communication, transferring non-serializable references (for example an akka stream or some local resources).

In the original PR I proposed a solution and then we discussed some alternatives. I did not check yet how the feature could be implemented in the new Scala 3 version.

It would be nice to support directly this with annotations because that way desert could be moved to Magnolia without changing its users code but maybe some breaking changes there are still better than maintaining a hand-written shapeless and a scala3 macro version in parallel ;)

What do you think?

deusaquilus commented 3 years ago

Hi Magnolia folks. Have there been any thoughts on this? I'm working for a company that would like to adopt Desert to do long-term serialization (hence the need for schema migration) but the presence of Shapeless is a non-starter because of compile-time issues. Could this issue be solved in magnolia on a lower level?

adamw commented 3 years ago

I don't think there has been any developments around this, but we're open to discussion and contributions :) I guess the main question is what would happen when calling caseClass.param.typeclass for a transient field? An exception would be thrown?

deusaquilus commented 3 years ago

I think that's reasonable. Just like calling .get on a None or a Failure.

adamw commented 3 years ago

Ok, well I don't see a reason not to support such transient fields. We just have to come up with a good name (@deriveTransient?) and implement :)

vigoo commented 3 years ago

@adamw so you suggest to have a built-in @deriveTransient (or whatever) annotation for this. I think that would work and probably the simplest approach.

Two things come to my mind:

It should work both on fields and constructors
In desert transient fields have a default value for deserialization so I was using two different annotations: https://github.com/vigoo/desert/blob/master/desert-core/src/main/scala/io/github/vigoo/desert/transient.scala If there is a general purpose transient annotation in magnolia then the library could have a @default(value) annotation and then for transient fields it would require these two annotations. Only drawback is that verifying that both exists would move to the derivation while with a single annotation it forces the user to immediately fill it with the value. On the other hand this default value parameter is too specific for my library (or the domain of serialization at least) so probably shouldn't be Magnolia's concern.

adamw commented 3 years ago

But the default would have to be per-typeclass, so I don't think it applies here.

A downside of an annotation is that it requires modifying the case class (derivation source), but then there's no other easy way of providing additional derivation metadata.

vigoo commented 3 years ago

One more thing came to my mind. If there is a global magnolia-defined transient annotation that would shortcut all derivations regardless of the typeclass which may not be a good idea. For example let's say we don't want to serialize transient fields with my serialization library but you may still want to derive a Show instance that works with the full data type.

To avoid this there could be some overrideable behavior in the derivation class

at least to enable/disable the transient feature for a given derivation
or, probably better, a isTransient(annotations: List[StaticAnnotation]): Boolean like function

The second would allow different libraries to define their own transient annotations which would also solve the typeclass-specific default value behavior.

Of course it may not be very likely that people are marking different subsets of their types for different type class derivations but still this feels like a safer/more generic solution to me.

joroKr21 commented 3 years ago

I would suggest something more general again 😄 https://github.com/softwaremill/magnolia/pull/279#issuecomment-766475143

adamw commented 3 years ago

@joroKr21 I like that :) Even better, maybe we could simply parametrise the derivation macro with a code that would decide, if a field should be ignored or not (and possibly provide a default value)? That way you could use annotations (which could be one of the implementations of the parameter provided by default), but also implement other, e.g. name-based strategies. The code in question would be evaluated at compile-time of course.

joroKr21 commented 3 years ago

I think that could work in Scala 3 where we have more options (by passing inline parameters and such) but not so much in Scala 2 where we can only pass additional type parameters to the macro.

adamw commented 3 years ago

I was thinking about Scala 3. I think the design is more or less sealed and "done" for Scala 2.

vigoo commented 3 years ago

Ah I was thinking about both Scala 2 and 3, because I think @deusaquilus 's use case is for Scala 2 for now.

@adamw thinking about the macro to decide if a field should be ignored or not - doesn't this lead us to an alternative where simply the typeclass provided for the derivation is lazy? That way the transient field support would be to simply check the annotations (or name etc) and does not access the derived typeclass at all. I'm not sure how hard would that be to implement though.

deusaquilus commented 3 years ago

Is lazy-derivation at compile-time possible? How do you do it?

vigoo commented 3 years ago

No I was just thinking it through better and realized that would not be possible. (At least I don't see how)

deusaquilus commented 3 years ago

Would @deriveTransient still be possibility for Scala 2? I think annotations, even non-configurable ones, are fine so long as they are simple and scarce. Coming from the DB world I have many, many scars from horrible things like @NamedQuery(<20 lines of code here>) but I don't think we need to religiously prohibit annotations based on those things.

vigoo commented 3 years ago

I think the discussion is not really about whether to use annotations or not, but rather how configurable this behavior is. And probably also about which version (Scala 3 or both 2 and 3) we can implement it.

I would be happy if we can release it for both versions (and happy to help implementing it) and would allow at least the level of configuration I suggested above, but also like the even more general configuration idea of @joroKr21 :) Not completely clear though where those configuration types would be defined. An inner type defined in the derivation object found by name?

deusaquilus commented 3 years ago

I agree that a more configurable design is very desirable for Scala 3 but it would be nice to have at least something for Scala 2. Perhaps we could worry about things like configurability and selective typeclass exclusion in Scala 3 but just have a bare-minimum implementation in Scala 2 that just does a single-field exclusion (i.e. for all derivations). So we could do @deriveTransient for Scala 2 and for Scala 3 that would just be a overridable default of some sort (hence it can be backwards compatible).

joroKr21 commented 3 years ago

I think the discussion is not really about whether to use annotations or not, but rather how configurable this behavior is. And probably also about which version (Scala 3 or both 2 and 3) we can implement it.

Yes, that's right - we would still need to mark the field in question in some way (in Scala 2 I can only think of annotations as feasible). The question is what should that annotation be? Something provided by Magnolia? Or should the library authors define their own. The configuration approach enables the latter. E.g. default makes sense mostly when deserializing something. Maybe for another typeclass there are different arguments needed to that annotation.

Not completely clear though where those configuration types would be defined. An inner type defined in the derivation object found by name?

That would be up to the library author. It just has to be passed as a type parameter to the macro. In Scala 2.13 it could be even a refined type directly there (because we have access to singleton types). That would also work in Scala 3 but we could come up with something better there.

deusaquilus commented 3 years ago

I think the configuration approach is really elegant, it would give different authors the ability to define custom annotations for their derivations resolving many potential separation-of-concerns issues. My question is, what is the effort level that this would involve? Is it worth doing something this nice for Scala 2?

deusaquilus commented 3 years ago

...maybe in Scala 2 we can just do one simple annotation and in Scala 3 have it be configurable:

  // In Scala 3...
  object CsvConfig extends Magnolia.Config {
    type Proxy = Csv.type
    type Ignore = transient // (default: deriveTransient)
    final val readOnly = true
    final val minFields = 0
    final val maxFields = -1
    final val minCases = 0
    final val maxCases = -1
  }

adamw commented 3 years ago

It does seem these are two separate issues/tasks really :) If anybody is up to implementing it, I think we can go with the config as @joroKr21 suggested for Scala2. For Scala3 I think I would prefer just passing some function which would decide whether to ignore / provide default or not (without mandating that annotations are involved, custom or not), as that's more flexible.

I'm away until 1st September but in case you'd have a PR @wojciechUrbanski or @mbore might step in, review, merge & release.

deusaquilus commented 3 years ago

Wait a minute, how is it possible to read value-level elements e.g. final val minFields = 0 from CsvConfig inside of the macro? You can CsvConfig fields as Tree elements from the TypeTag.tpe but you can't get their values, at least not from there...

deusaquilus commented 3 years ago

If it was a literal type e.g. type minFields = 0 that should be possible. There's maybe one other way you could do it but that involves multiple compilation units.

deusaquilus commented 3 years ago

Anyway, implementing this thing seems like a pretty big task for a fairly small feature that we need. In the least you would need to:

Create genWith which I'm guessing would be an overloaded version of gen. This would probably involve some refactors.
Grab the necessary fields from CsvConfig. Resolve the minFields etc... into actual values (as a mentioned before, these probably need to be literal-types), and then pass them into a configuration-holder of some sort.
Pass this configuration-holder all the places that need it and...
Implement the derivation-exclusion annotation functionality and pass the configuration for which annotation to check into there.

Is there any chance we could do something simpler just to unblock desert/serialization stuff? (I apologize for sounding like an impatient goon in advance...)

joroKr21 commented 3 years ago

Wait a minute, how is it possible to read value-level elements e.g. final val minFields = 0 from CsvConfig inside of the macro?

That makes it a constant with a literal type (for Scala 2.12's sake - on Scala 2.13 you could just say genWith[Config { val minFields: 0 }, Foo] - yes it's a bit restrictive to use only literal types and type members.

Is there any chance we could do something simpler just to unblock desert/serialization stuff? (I apologize for sounding like an impatient goon in advance...)

I don't think it's that difficult to implement but my OSS involvement hasn't been great in the summer months 😄 - as the season closes to an end I will probably find more time again. If I give it a go it would be the configurable version. But that doesn't stop somebody else from implementing the simpler version. It could then become a default if we make it configurable later.

deusaquilus commented 3 years ago

Whoa, that actually works 😲 :

  object Foo {
    final val minFields = 0
  }
  val v: 0 = Foo.minFields

Pardon my ignorance...

lukaszlenart commented 3 years ago

I played a bit with the Config concept, but have problem with obtaining a user defined implicit config, something like this:

  implicit val config: Config = new Config {
    override type Ignore = transient
  }

  implicit def deriveCsv[A]: Csv[A] = macro Magnolia.genWith[A]

and detecting the implicit

  def genWith[T: c.WeakTypeTag](c: whitebox.Context): c.Tree = {
    import c.universe._

    val config = c.inferImplicitValue(typeOf[Config]) match {
      case Literal(Constant(configValue: Config)) =>
        println(s"Found config: $configValue")
        configValue

      case other =>
        println(s"Found other: $other")
        Config.defaultConfig
    }

    Magnolia.gen(c, config)
  }

and this always produces

Found other: <empty>

Any idea why?

joroKr21 commented 3 years ago

Oh you want to have it implicit? In my imagination it would just be an argument to the macro like Magnolia.genWith[A, MyConfig]

lukaszlenart commented 3 years ago

I thought it will be easier ... looks like not :\ Now I'm trying to use the argument approach and it almost works, but just getting some implicit errors :\

lukaszlenart commented 2 years ago

Still WIP but looks promising https://github.com/softwaremill/magnolia/pull/353

lukaszlenart commented 2 years ago

Can someone explain me idea behind Proxy, readOnly and min/max options?

trait Config {
  type Proxy <: Singleton { type Typeclass[A] }
  type Ignore <: annotation.Annotation
  val readOnly: Boolean
  val minFields: Int
  val maxFields: Int
  val minCases: Int
  val maxCases: Int
}

joroKr21 commented 2 years ago

Hey @lukaszlenart thanks for taking this on 💯

The idea of Proxy is to designate another object as the source of the join and co methods that are used for derivation. In the code it's called prefixObject I think.
The idea of readOnly is to replace ReadOnlyParam and ReadOnlyCaseClass to simplify the API - I guess this can be left for another PR / discussion since it involves an API change.
The idea of min/max parameters is very simple - to throw a compile error if they are violated. So for example if you want to exclude case objects you can set minFields = 1, or if you want to exclude "empty" sealed traits you would set minCases = 1.

lukaszlenart commented 2 years ago

Thanks @joroKr21 for the explanation. Implementing Proxy was easy, yet I'm still confused with min/max functionality. I have no idea how to implement that, tbh.

joroKr21 commented 2 years ago

The implementation is basically to check the number of case class fields / sealed trait subtypes and throw an error if it doesn't fall within the configured range.

lukaszlenart commented 2 years ago

Ok, this what I've been suspecting, thanks!

softwaremill / magnolia

Exclude params and cons from the derivation based on annotations #297

The problem