invenia / JLSO.jl

Julia Serialized Object (JLSO) file format for storing checkpoint data.
MIT License
90 stars 5 forks source link

Should `JLSOFile` have `get`, `get!`, `keys` and `haskey` methods #107

Closed nickrobinson251 closed 3 years ago

nickrobinson251 commented 3 years ago

A JLSOFile has both metadata as well as the serialised objects as a Dict{Symbol, Vector{UInt8}}

But you can interact with a jlso::JLSOFile using getindex and setindex! like jlso[:x] and jlso[:y] = 1, which makes it tempting to interact with the jlso like a Dict which serialises/deserialises the objects to/from disk e.g.

jlso = read(file, JLSOFile)
y_data = get!(jlso, :y, 1)

I'm not exactly sure if we want a JLSOFile should be exactly Dict-like (or even what exactly that means https://github.com/JuliaLang/julia/issues/25941). But i think get, get!, keys and haskey methods make sense - what do you think? Possibly also iterate, but less sure about that one.

Something like

Base.keys(jlso::JLSOFile) = keys(jlso.objects)

Base.haskey(jlso::JLSOFile, key) = haskey(jlso.objects, key)

Base.get(jlso::JLSOFile, key, default) = haskey(jlso, key) ? jlso[key] : default

Base.get!(jlso::JLSOFile, key, default) = get!(() -> default, jlso, key)
function Base.get!(func, jlso::JLSOFile, key)
    return if haskey(jlso, key)
        jlso[key]
    else
        jlso[key] = func()
    end
end

and if we want iterate then i guess just delegate that too?

Base.iterate(jlso::JLSOFile) = iterate(jlso.objects)
Base.iterate(jlso::JLSOFile, state) = iterate(jlso.objects, state)

Would a PR to define these be welcome?

rofinn commented 3 years ago

Yeah, I don't think we'll want to subtype AbstractDict or anything, but if you have a use-case for those wrapper methods then a PR would be welcome.

nickrobinson251 commented 3 years ago

The pattern i'm looking to support is something like

if isfile(filename)
    jlso = read(filename, JLSOFile)
    return get!(jlso, :qux) do         # read from JLSO file
        result = parse("qux.csv")    # or compute and store in JLSO file
    end
else 
    result = parse("qux.csv")
    save(filename, :qux, result)  
    return result
end

Although i'm not sure if this is a good pattern. Partly because it's using JLSO as a "stable" file format that i can use as an on-disk cache. I don't need to serialised data to be readable in years. But i'm trusting that if I have a JLSO file created by a prior computation i can safely add new objects tot he same file.

rofinn commented 3 years ago

FWIW, that might be better handled in your application logic with something like this?

if isfile(filename)
    jlso = read(filename, JLSOFile)
    haskey(jlso, :qux) || return parse("qux.csv")  # Load qux.csv if not found
    result = jlso[:qux]
    result isa Vector{UInt8} && return parse("qux.csv")  # Load qux.csv if not loadable
    return result
else 
    result = parse("qux.csv")
    save(filename, :qux, result)  
    return result
end
nickrobinson251 commented 3 years ago

that'd still require defining haskey(jlso, :qux), right? Also that doesn't add the result to the JLSOFile, does it?

Handling the result isa Vector{UInt8} case is a good point, although i don't think we've run into any problems with that yet, afaik

nickrobinson251 commented 3 years ago

i think i'll open a PR for these. Because otherwise i'm a bit tempted to work with jlso_objects = JLSO.lead(filename) which will eagerly deserialise things, or else work with jlso.objects which feels less nice than having the methods automatically delegate to that where appropriate

rofinn commented 3 years ago

that'd still require defining haskey(jlso, :qux), right?

Correct, you'd still want haskey define. I was more just thinking that it could be a way of addressing potentially incompatible serialized objects for your caching purposes.

Also that doesn't add the result to the JLSOFile, does it?

Right, you could always replace parse with some inner function that contains the lines in your else block and reuse it for each condition.