Stop using Template Haskell derived JSON encodings

paf31 commented 7 years ago

Before 1.0, we should probably remove any JSON encoders and decoders which use Template Haskell, and standardize any JSON formats emitted or parsed by the compiler.

It would probably also be a good idea to make sure that any JSON formats include the version number of the compiler they were generated with.

hdgarrood commented 7 years ago

This doesn't necessarily have to be a breaking change, although I guess it probably will be in practice?

paf31 commented 7 years ago

I've been thinking about this some more. I think the goals are to

Prevent future breakage due to changes in data types
Make the JSON a bit more readable, if possible

If those are the only goals, then I'm going to suggest something based on GHC.Generics. We could use a computed type-level hash to make sure the types don't change without some sort of acknowledgment from the developer, and we could generate JSON encoders which are closer to the way we do things in CoreFn (i.e. "sexpr-style").

Any thoughts @hdgarrood?

hdgarrood commented 7 years ago

That sounds okay, but what do we do if the types do change? Just write out the old instance by hand and then figure out how to modify the JSON instances so that it works with the new data type?

paf31 commented 7 years ago

@hdgarrood The idea was that if the data types change, then so will the hash, which is computed at compile time. So the user who breaks things will be forced to update the hash in code, acknowledging the fact that the format changed.

Another version of this would be to simply have separate data types solely for JSON encoding and decoding, and to copy things back and forth. Then we just wouldn't edit those modules, except in the case of a deliberate breaking change.

hdgarrood commented 7 years ago

Okay, yeah. Either of those sound good to me, although I suppose the latter is probably better suited to dealing with both a) changing the JSON format and b) changing the compiler data type declarations internally without affecting the JSON format. I suppose an advantage to either of these approaches over manually written instances is that we don't then have to worry about having encode and decode instances not lining up.

joneshf commented 7 years ago

Prevent future breakage due to changes in data types

I'd like to suggest golden tests for this. I dunno if that's a thing that is happening already somehow, but this library is pretty easy to setup if you're interested: https://hackage.haskell.org/package/hspec-golden-aeson

hdgarrood commented 7 years ago

Discussion on https://github.com/purescript/purescript/pull/2878 has reminded me of this and has made me harden my position somewhat.

I don't think it would be a good solution to use what we have now plus a hardcoded hash of all the information that goes into the derived Aeson instance. Yes, it would prevent us from making breaking changes accidentally, but the way I see it we really should be aiming not just for that, but also for having the ability to freely change data types without being forced to break the format.

For example, if we wanted to move the Language.PureScript.Docs.Types.Namespace data type into a more appropriate place, eg Language.PureScript.Names, and change the ide code to use that type instead of its own version, we would be forced to either make a breaking change to the JSON or attempt to handwrite a version of what instance deriving was previously giving us, and neither of these is a good position to be in.

Having separate data types solely for JSON encoding and decoding addresses this problem, but both of these solutions suffer from another flaw; this occurred to me after seeing https://github.com/purescript/purescript/pull/2873. The problem is that Aeson's instance derivation code could change between releases. We've been bitten before by having version bounds on Aeson which were too lax, meaning that JSON formats could be incompatible between two copies of the same version of the compiler if one had been built with aeson < 1.0 and the other with aeson >= 1.0.

I think the ideal solution is one where we write the rules by hand in such a way that we are expressing how to parse and serialize a data type at the same time, and the resulting parsers and serializers are correctly mutually inverse by construction. The JsonGrammar package is something I'm investigating at the moment as a possible way of achieving this. Defining parsers & serializers in this way, together with golden tests to ensure that JSON produced by older compiler versions still parses correctly, seems to me to be the best option (as long as the types don't get completely out of control).

If that turns out to be too unwieldy I think I would still prefer separate handwritten parsers and deserializers + golden tests + roundtrip tests (i.e. what Language.PureScript.Docs.Types does now, except I need to add some golden tests) over either of the two options previously discussed on this issue.

hdgarrood commented 7 years ago

I've just had another thought, actually. I don't think bidirectional parsers/serializers like JsonGrammar provides will work for us, as I don't think there's any way of providing backwards compatibility for non-breaking changes such as what I ended up doing here: https://github.com/purescript/purescript/blob/21bd77fb6bb42e3b8732d28b9aca2c25626a8307/src/Language/PureScript/Docs/Types.hs#L589-L596

I think the only remaining option is therefore to do more or less what we're currently doing in Docs.Types but also add tests where we attempt to parse JSON produced by older compilers to make sure that we haven't accidentally lost the ability to read old data.

purescript / purescript

Stop using Template Haskell derived JSON encodings #2655