poseidon-framework / poseidon-hs

A toolset to work with modular genotype databases in the Poseidon format
https://poseidon-framework.github.io/#/trident
MIT License
7 stars 2 forks source link

Adjust janno From- and ToJSON instances to test for the same aspects as the cassava .tsv parser #167

Closed nevrome closed 1 year ago

nevrome commented 2 years ago

I just came across some powerful ways to automatically derive instances of the classes ToJSON and From JSON e.g. with deriving-aeson. I wonder if we could use this to remove some boilerplate code (e.g. for PoseidonYamlStruct).

stschiff commented 2 years ago

Hmm... but Aeson already has a Generic way of automatically generating conversion code, using GHC.Generics, described pretty far at the top here: https://hackage.haskell.org/package/aeson-2.0.3.0/docs/Data-Aeson.html

I deliberately avoided that, so far, because I like the sense of control I have with the manual ToJSON and FromJSON instances, so I would prefer to stick with that. But I'm happy to be convinced if you rewrite it using Generics and it works and is indeed more compact.

I don't know what nieche deriving-aeson fills, given the already existing Generic deriving machinery in aeson.

nevrome commented 2 years ago

Ah - interesting. I thought aeson can not strip prefixes, but fieldLabelModifier seems to do exactly that. I guess we don't need deriving-aeson then. Maybe it was written at a time when aeson did not provide that.

I thought we would need Template Haskell to get rid of this verbose code, so I was really happy to see that it's possible without it. I'll try and if it works, is sufficiently clear and does not cause the compile time to explode, then I'll open a PR.

nevrome commented 2 years ago

I looked a bit deeper into our code and made some observations:

instance Csv.FromField Latitude where parseField x = do val <- Csv.parseField x if val < -90 || val > 90 then fail $ "Latitude " ++ show x ++ " not between -90 and 90" else pure (Latitude val)


- There still is a lot of room for more generic deriving in Janno.hs, Package.hs, GenoTypeData.hs and SecondaryTypes.hs
- Cassava also supports default instances: https://hackage.haskell.org/package/cassava-0.5.2.0/docs/Data-Csv.html#g:15 This could help to boil down the code in Janno.hs
stschiff commented 2 years ago

I don't think the JSON is actually "less safe", or I don't understand what you mean by that.

Currently, I think the Cassava FromField instances are written manually, whereas the JSON instances are automatic. I am not sure whether we can really write combined code for both formats. But I think using generic instances for Cassava sounds like a good idea to reduce boiler plate code.

nevrome commented 2 years ago

My point is: If somebody writes software that relies on the JSON interface of the Janno module, then they can feed in bad data. You can not decode a Latitude of 134 if you read from .tsv, but you can, if you read from JSON. Or am I missing something?

This isn't important, but it also feels unsatisfying.

stschiff commented 2 years ago

Aaah, OK. I should've looked at your code snippet in detail, I just saw the difference of manual vs automatic instances. OK, you have a good point there. Yes. Well, that's easy to fix, we just need to write some of these FromJSON instances by hand. No problem. I can get on that.

nevrome commented 1 year ago

I think we could unify the behavior of the .janno decoding from JSON and .csv with smart constructors. Here's an example for Latitude:

That's what our code looks like right now:

instance FromJSON Latitude

instance Csv.FromField Latitude where
    parseField x = do
        val <- Csv.parseField x
        if val < -90 || val > 90
        then fail $ "Latitude " ++ show x ++ " not between -90 and 90"
        else pure (Latitude val)

Only the .csv parser performs a meaningful range check. I think the following code would be better:

instance FromJSON Latitude where
    parseJSON = withObject "Latitude" $ \v -> v .: "Latitude" >>= makeGoodLatitude

instance Csv.FromField Latitude where
    parseField x = Csv.parseField x >>= makeGoodLatitude

makeGoodLatitude :: MonadFail m => Double -> m Latitude
makeGoodLatitude x
    | x >= -90 && x <= 90 = pure (Latitude x)
    | otherwise           = fail $ "Latitude " ++ show x ++ " not between -90 and 90"

With this setup also parsing from JSON fails for bad data:

> (Data.Aeson.eitherDecode "{\"Latitude\": 89}") :: Either String Latitude
Right 89.0
(Data.Aeson.eitherDecode "{\"Latitude\": 91}") :: Either String Latitude
Left "Error in $: Latitude 91.0 not between -90 and 90"

This works, because both aeson and cassava rely on MonadFail to log parsing errors.

Is this correct? Is there even simpler syntax for the FromJSON instance definition? withObject "Latitude" $ \v -> v .: "Latitude" seems verbose.

stschiff commented 1 year ago

Beautiful. Yes, indeed... that's exactly what generic classes like MonadFail are for. Love it!

nevrome commented 1 year ago

This was added in #221, but then again deactivated in #228, because it broke backwards compatibility between server and client. We should activate it again when we're updating the server in the near future, @stschiff.

nevrome commented 1 year ago

I think the ship for this issue has sailed. We certainly don't want breaking API changes at this point. I also don't believe this to be so critically important any more. And finally the custom ToJSON and FromJSON instances I once wrote rely on a wrong use of the show typeclass, which is a general design flaw of poseidon-hs.

So I'll close this issue. now. I don't want to delete the commented code yet, though, because I think the general design is really useful and I would like to keep it around a bit longer -- just as a reference.