quinnj / JSON3.jl

Other
214 stars 47 forks source link

Mapping subtypes via object keys #188

Open Libbum opened 2 years ago

Libbum commented 2 years ago

I'm having issues identifying a sane way to manage an input dataset.

One object expects multiple types of subobjects, for this MWE we'll choose two simple ones:

flatcron = """
  {
    "cron" : {
        "repo": "TestCron"
    }
  }
"""
flatpfs = """
  {
    "pfs" : {
        "repo": "TestPFS",
        "glob": "*"
    }
  }
"""

I can parse these with

struct PFS
    repo::String
    glob::String
end

struct Cron
    repo::String
end

struct Input
    pfs::Union{Nothing, PFS}
    cron::Union{Nothing, Cron}
end

StructTypes.StructType(::Type{Input}) = StructTypes.Struct()
StructTypes.StructType(::Type{PFS}) = StructTypes.Struct()
StructTypes.StructType(::Type{Cron}) = StructTypes.Struct()

but I know that I'll always have one of these options and never more. So I'd like to move to something like

abstract type InputFlat end

struct PFSFlat <: InputFlat
    repo::String
    glob::String
end

struct CronFlat <: InputFlat
    repo::String
end

StructTypes.StructType(::Type{InputFlat}) = StructTypes.AbstractType()
StructTypes.StructType(::Type{PFSFlat}) = StructTypes.Struct()
StructTypes.StructType(::Type{CronFlat}) = StructTypes.Struct()
StructTypes.subtypes(::Type{InputFlat}) = (pfs=PFSFlat, cron=CronFlat)

This of course fails, as I'm missing the subtypekey for InputFlat, since I have multiple (i.e pfs and cron)... Is there a way to map these values such that

cron_parse = JSON3.read(flatcron, InputFlat)
cron_parse.repo # TestCron

pfs_parse = JSON3.read(flatpfs, InputFlat)
pfs_parse.glob # *
quinnj commented 2 years ago

Hmmm....this is tricky. Been thinking through various solutions here. The use of subtypekey is slightly different in the designed case, however, since the value of the subtypekey is used to figure out the type to parse, yet in this case, the different key would inform which type would be parsed.

I'll have to think a bit further, but one idea is that we might allow passing a function to subtypekey that would be called on the keys being parsed and could return a type to be parsed.

Libbum commented 2 years ago

Thanks.

I've had a bit of a think about it myself and I guess what I'm asking isn't a two way function either. Assuming you'd want to keep the serialisation and deserialisation results the same—something like this is an easy way to break things.

For the moment, I'm just using a functor on the problem struct:

function (input::Input)()
    for field in fieldnames(typeof(input))
        data = getfield(input, field)
        isnothing(data) || return data
    end
end
FedorChervyakov commented 2 years ago

I am also facing a similar issue. I have a JSON object that contains two fields: a type field and a data field, like so:

{
  "type": "20210101:temp",
  "data": {
      "date": 20210101,
      "min": "19.5",
      "max": "23.5"
  }
}

The trouble is that I have to process the type field first to extract the correct type for data. In this example I have to remove date from the type field to get the correct type. Because of this it is impossible to use AbstractStructs in this scenario. It would be awesome if it was possible to pass a function to subtypekey that returns the correct type and not just a NamedTuple.

DatName commented 2 years ago

Rust solves this problem in a different way. It allows not only 'tagged' structures (just like we have here with subtypekey), but also 'untagged': https://serde.rs/enum-representations.html.

There is no explicit tag identifying which variant the data contains.
Serde will try to match the data against each variant in order and
the first one that deserializes successfully is the one returned.

I recently worked with this approach in rust and I find it quite cool, actually.

In Julia, since we have a predefined set of types to choose from (subtypes of an AbstractType, or passed directly as function arguments), I guess, we can always do this efficiently on-the-fly in one pass of the parsing (by "eliminating" types that do not match).

It would require to dig very deep into existing code to implement it, though.