rafaqz / FieldMetadata.jl

Metadata for julia fields
Other
22 stars 4 forks source link

Using FieldMetadata with external specs #8

Open ConnectedSystems opened 4 years ago

ConnectedSystems commented 4 years ago

Hello,

Before I start, thank you again for responding to my earlier request over at https://github.com/rafaqz/Flatten.jl/issues/14

The paper I mentioned is nearing its final draft stages and should be submitted in the next few weeks.


I'm wondering if I could ask for your advice and thoughts on how best to apply the approach used in FieldMetadata (and Flatten) with external specification of data and model structure.

There are two parts that I'm struggling with at the moment.

First, conceptually, the example matches exactly what I want to do the only difference really is my specifications will come from a JSON or YAML file. One reason for this is that I work with people who don't know Julia (e.g. specialists in other fields not necessarily computational)

@describe @limits @with_kw struct WithKeyword{T}
    a::T = 3 | (0, 100) | "a field with a range, description and default"
    b::T = 5 | (2, 9)   | "another field with a range, description and default"
end

Before I discovered your package I wrote my own based on the paradigm used in the EMA workbench:

@with_kw mutable struct RealParameter <: AgParameter
    name::String
    min_val::Float64
    max_val::Float64
    default_val::Float64
    value::Float64

    RealParameter(name, min_val, max_val, value) = new(name, min_val, max_val, value, value)
end

The YAML looks like this and each parameter gets parsed into the relevant type.

# values below are given as list of
# nominal "best guess" values, min, and max 
# plant_dates are assumed to be static across all seasons
properties:
  plant_date: ["CategoricalParameter", "05-31", "05-31", "05-31"]
  yield_per_ha: ["RealParameter", 3.25, 2.5, 7.0]
  price_per_yield: ["RealParameter", 200.0, 100.0, 250.0]

Individual specs described above are for model (sub)components which are then composed to represent models, which in turn represents a socio-environmental system.

It's probably possible to dynamically create FieldMetadata notated objects, filling in the necessary data from the YAML spec (and at the same time cut out my implementation), but I'm just getting started with Julia and not sure what the best approach to doing this is.

Second:

To explore the uncertainty and conduct sensitivity analysis I need to run multiple (hundreds to thousands) of simulations with values sampled from the above parameter ranges. I'm wondering if there is already a straight forward way to collate all FieldMetadata objects and dump out in a single file (maybe with Flatten?)

1) A CSV of parameter value ranges 2) A CSV of parameter values as used in simulations (e.g. represents N x D with N simulations and D columns with values for each parameter) 3) Recreate the objects from the above (i.e. re-generate an entire model using data from a row)

The first two is to record provenance in a way that allows non-technical people to view what the "settings" were in a file format they are familiar with, while the last would enable the same batch of simulations to be re-run if needed.

Or maybe I'm over-complicating things and you have a better suggestion.

Apologies for the very long post!

rafaqz commented 4 years ago

Sounds like I should be a coauthor on this paper if you are including that much about my work.

This isn't easy to do with FieldMetadata.jl because functions have to be defined at the top level scope, and it stores fields in functions.

But there are some ways we could do this if I had some motivation to implement them ;). We just use julia files.

ConnectedSystems commented 4 years ago

Sounds like I should be a coauthor on this paper if you are including that much about my work.

Unfortunately it's only a sentence in the paper I mentioned. I assure you if it was more substantial I would have invited you to be involved as co-author. The question here is unrelated to that specific publication, apologies if I've caused confusion.

To give you the specific extract of the citation:

"Recent efforts avoid explicit handling of this semantic dimension, treating it as inevitable. Instead, tooling is being developed (e.g. Schouten and Deits, 2020) to support the seamless move between the nested hierarchical and their equivalent flattened representations, and would provide direct compatibility with specific sensitivity and uncertainty analysis tools."

Right now I'm toying with an idea of tweaking Flatten to work with the AgParameters I currently have, but still need to collate all the parameters together...

ConnectedSystems commented 4 years ago

Incidentally, if you're interested in co-authoring a paper, I'm planning a manuscript on my Agtor model (now re-written in Julia from Python) which is will very likely to leverage your work here.

rafaqz commented 4 years ago

That sounds interesting! I also work in ecology/agriculture research in Aus btw :)

My reason for asking is I am writing too many packages... way ahead of what I can publish papers so it's a little concerning they will get scooped before I get the chance to.

Your way of attaching all parameter information might be more robust... I used the FieldMetadata.jl method to avoid having those big structs everywhere, especially as I also attach descriptions and other things for Interact.jl interfaces. But you could use RealParameter with Flatten.jl and just not use FieldMetadata.jl. You can just flatten all RealParameter then map over the tuple for each fieldname:

minvals = map(rp -> rp.min_val, flatten(x, RealParameter))

I should write something to reconstruct individual fieldnames... but you can use Setfield.jl for that too:

params = map((rp, mv) -> @set rp.min_val = mv, flatten(x, RealParameter), min_vals)
x = reconstruct(x, params, RealParameter)

I'm wondering if there is already a straight forward way to collate all FieldMetadata objects and dump out in a single file (maybe with Flatten?)

This part is easy... use metaflatten in Flatten.jl, fieldnameflatten gets the field names.

  1. use metaflatten on your range function
  2. use Flatten.jl to rebuild structs with the parameters. The problem is if you are changing the model composition, you will somehow have to save the structure. Normally I just separate that in the script so the saved parameters are from one model composition only, in separate files for each change.
  3. Rebuilding from 2 is easy, with the same caveat as 2 - use different files for different model compositions. For 1 I would just load the parameters manually into optim or whatever directly without using FieldMetadata.jl - they are for one set model composition anyway.

Saving parameters for varied model composition is going to be the hard part. Need to find a way of storing the tree structure of the model. I wrote this hack to do it with generated julia code: https://github.com/rafaqz/Codify.jl

ConnectedSystems commented 4 years ago

I ramble on for a bit below so feel free to ignore, but thank you very much for your response, it has pointed me in a few good directions and think I can get to a working solution now, even if it will probably not be as general as I would have liked.

Re the paper - let me know if you're keen to be involved/collaborate and I'll keep you posted as I work on this. I'm in the last month-ish of my PhD (which I am doing by publication) and so most of my time is on my last paper (a different paper from the one I spoke to you of) and putting together the thesis. In other words, progress will be relatively slow in the short-term.

With regard to timeline, I'm thinking of submitting a paper introducing Agtor and an example case study to this special issue, although the publication fee is steep.

If you're interested I can contact you directly to give more detail on what I plan to showcase in the paper.


While the model composition does change dynamically, I don't necessarily have to keep track of the change in structure because for a particular batch of model runs it will be the same.

I think I've got the beginnings of an approach worked out in a way that doesn't directly require the Flatten family of packages, but the approach definitely is fairly specific to my use case and could be better generalised. For one, I rely on a semantic naming structure to map a parameter to its instance/value (explained with example below**).

I've reworked things a little so that I can generate an array of the different parameter types and extract the min/max values out where relevant from a directory of YAML files. This solves 1 in my original post.

The pipeline essentially is:

[directory of YAML files] 
-> parse YAML file and extract Real/Constant/Categorical parameter types 
-> Create dataframe of `p` columns where `p` is total number of parameters
-> generate `N` samples for Real and Categoricals between min/max and fill in dataframe
-> fill columns for Constants with respective values

Each column represents a parameter, and each row represents a combination of values for a model run (as the approach I apply is exploratory modelling and analysis), and the name of the column maps back to its (nested) object structure, e.g** Irrigation___gravity__capital_cost where Irrigation is the struct, gravity is the name of the specific instance (also given in the YAML), and capital_cost is a property.

In running the model, I'd take a row of values and generate a model instance using those values. I'm only up to step 2 above, but I can see where Codify.jl will fit into this flow.

It's actually pretty cool as it means I can potentially save the exact model used to be regenerated alongside any results

rafaqz commented 4 years ago

Ok I've been thinking about loading metadata from a csv. It should actually be pretty easy, something like:


# Load CSV to data frame. This is psedudocode:
df = CSV("myparams.csv") |> DataFrame # Or whatever the command is
for i in 1:nrows
    # We need to have stored the type, field and bounds/default/whatever metadata you need. 
    # Again psedudocode:
    fn = df.fieldnames[i]
    typ = df.typ[i]
    default = df.defaults[i]
    bounds = df.bounds[i]
    # Then @eval to write new function code with the DataFrame data 
    # Might need `QuoteNode` or something on $fn.
    @eval begin
        FieldMetadata.default(::$typ, ::Val{$fn}}) = $default 
        FieldMetadata.bounds(::$typ, ::Val{$fn}) = $bounds 
    end
end

We could make a wrapper macro that does this.

Also FYI, there are some syntax changes in 0.2, should make things a bit cleaner but will require some updates to your scripts.

ConnectedSystems commented 4 years ago

Thanks for the heads up @rafaqz.

Things are really heating up now on the PhD front so I'll be relatively quiet on this for a bit. I'll come back to it asap. Thanks for all your input!