ml4ai / delphi

Framework for assembling causal probabilistic models from text and software.
http://ml4ai.github.io/delphi
Apache License 2.0
24 stars 17 forks source link

GrFN spec interlanguage compatibility #132

Open jpfairbanks opened 5 years ago

jpfairbanks commented 5 years ago

@adarshp, I took a look at the spec for GrFN. https://delphi.readthedocs.io/en/master/grfn_spec.html#top-level-grfn-specification This is a great representation. Kind of like a higher level IR for translating between languages.

One thing I noticed is this note:

TODO: we think Fortran is restricted to integer values for iteration variables, which would include iteration over indexes into arrays. Need to double check this.

If the GrFN schema is going to work for multiple languages it is going to need to support iterator loops like C++, Python, Julia have.

I guess you could have int-loop and iterator loop or a per language loop construct.

cl4yton commented 5 years ago

@jpfairbanks : Agree 100%, and this will require extension of the loop plate representation. The base representation will, I believe, remain the same, but the loop index and the condition for loop terminal condition will need to be augmented. Right now there is a loop index variables with a fn that simply update the integer value, and there is a terminating condition variable that generally determines the stopping condition (this is not well-documented or part of a current working use case, but is the current plan. What we need to extend to more general iteration over values is representation of an iterator variable (not just a loop index variable), possibly as a generator fn that computes the next iterator value for the next loop iteration state (or looks up from an existing store). And likewise, the loop terminating condition would possibly be computed based on this state (or some other updates within the loop). BTW, the stopping condition version of looping is also not yet formally specified -- this would be for handling a while loop. But both of these ((1) stopping condition for while loop and (2) generalizing iteration) are on our radar and definitely necessary to extend expressivity.

jpfairbanks commented 5 years ago

I just took about an hour tonight to code up a Julia version of the GrFN spec. It isn't quite done yet. but it lets you read in that JSON.

module GrfnMod

mutable struct Grfn{D,T}
    name::String
    start::String
    created::D
    functions::T
end
struct Input
    name::String
    domain::String
end

struct Container{I,V,B}
    name::String
    type::String
    input::I
    variables::V
    body::B
end
Container(tuple) = Container(tuple...)

mutable struct GrfnBody{S,T,U}
    name::S
    type::T
    reference::U
end

GrfnBody(x::NamedTuple) = GrfnBody(x...)

mutable struct GrfnSource{S}
    name::S
end

mutable struct GrfnNode{T,U,V}
    name::String
    type::T
    target::String
    sources::V
    body::U
end

struct Assignment{T,U}
    name::String
    typ::String
    target::String
    sources::T
    body::U
end
function Assignment(n,t,r,s)
    return Assignment(n,t,r,s,Missing)
end

function Assignment(tuple::NamedTuple)
    return Assignment(tuple...)
end

struct LoopPlate{I,T,R,B}
    name::String
    type::String
    input::I
    index_variable::T
    index_iteration_range::R
    body::B                
end
LoopPlate(t::NamedTuple) = LoopPlate(t...)
end

The place holder variables like T, R, B get instantiated as Vector{NamedTuple{...}} based on the content of the JSON file. I haven't used it for anything yet. But constructing the types was a good exercise to learn the schema. My plan for ASKE includes having more information in the KG, of course that KG maps less directly onto a DBN but would allow executing models that can't be represented as DBM.

see also https://github.com/jpfairbanks/SemanticModels.jl/blob/master/doc/src/notebooks/grfns.ipynb

jpfairbanks commented 5 years ago

I updated the notebook to include a Julia Representation of the GrFN. I think this contains all the information in the JSON. I would still need the function bodies and to do something with the variable indexes in order to actually execute this model representation. It would be nice to have an implementation of the modeling representation in multiple languages since the models themselves will be in multiple languages.

Grfn("pgm.json", "2018-10-04", "CROP_YIELD", begin
          #= In[188]:2 =#
          TOTAL_RAIN = UPDATE_EST__lambda__TOTAL_RAIN_0
          IF_1 = UPDATE_EST__lambda__IF_1_0
          YIELD_EST = UPDATE_EST__lambda__YIELD_EST_0
          YIELD_EST = UPDATE_EST__lambda__YIELD_EST_1
          YIELD_EST = :Missing
          Container(UPDATE_EST, domains=Dict(RAIN => real, TOTAL_RAIN => real, YIELD_EST => real), variables=Dict(TOTAL_RAIN => real, RAIN => real, IF_1 => boolean, YIELD_EST => real), body=begin
                      #= In[187]:9 =#
                      Var(TOTAL_RAIN, 1) = UPDATE_EST__assign__TOTAL_RAIN_0(Var(TOTAL_RAIN, 0), Var(RAIN, 0))
                      Var(IF_1, 0) = UPDATE_EST__condition__IF_1_0(Var(TOTAL_RAIN, 1))
                      Var(YIELD_EST, 1) = UPDATE_EST__assign__YIELD_EST_0(Var(TOTAL_RAIN, 1))
                      Var(YIELD_EST, 2) = UPDATE_EST__assign__YIELD_EST_1(Var(TOTAL_RAIN, 1))
                      Var(YIELD_EST, 3) = UPDATE_EST__decision__YIELD_EST_0(Var(IF_1, 0), Var(YIELD_EST, 2), Var(YIELD_EST, 1))
                  end)
          MAX_RAIN = 4.0::Float64
          CONSISTENCY = 64.0::Float64
          ABSORPTION = 0.6::Float64
          YIELD_EST = 0::Int
          TOTAL_RAIN = 0::Int
          RAIN = CROP_YIELD__lambda__RAIN_0
          Loop(name=CROP_YIELD__loop_plate__DAY_0, input=Symbol[:CONSISTENCY, :MAX_RAIN, :ABSORPTION, :RAIN, :TOTAL_RAIN, :YIELD_EST], indexvar=DAY, range=1:32::Int, body=begin
                      #= In[186]:95 =#
                      Var(RAIN, 0) = CROP_YIELD__assign__RAIN_0(Var(DAY, -1), Var(CONSISTENCY, -1), Var(MAX_RAIN, -1), Var(ABSORPTION, -1))
                      UPDATE_EST(Var(RAIN, 0), Var(TOTAL_RAIN, -1), Var(YIELD_EST, -1))
                      print(Var(DAY, -1), Var(YIELD_EST, -1))
                  end)
          Container(CROP_YIELD, domains=Dict(), variables=Dict(DAY => integer, RAIN => real, YIELD_EST => real, TOTAL_RAIN => real, MAX_RAIN => real, CONSISTENCY => real, ABSORPTION => real), body=begin
                      #= In[187]:9 =#
                      Var(MAX_RAIN, 2) = CROP_YIELD__assign__MAX_RAIN_0()
                      Var(CONSISTENCY, 2) = CROP_YIELD__assign__CONSISTENCY_0()
                      Var(ABSORPTION, 2) = CROP_YIELD__assign__ABSORPTION_0()
                      Var(YIELD_EST, 2) = CROP_YIELD__assign__YIELD_EST_0()
                      Var(TOTAL_RAIN, 2) = CROP_YIELD__assign__TOTAL_RAIN_0()
                      Plate(CROP_YIELD__loop_plate__DAY_0, CONSISTENCY, MAX_RAIN, ABSORPTION, RAIN, TOTAL_RAIN, YIELD_EST)
                      print(Var(YIELD_EST, 2))
                  end)
      end)
adarshp commented 5 years ago

Hi @jpfairbanks! Sorry for the delayed response, I had a mountain of backlogged work accumulated from the two back-to-back DARPA trips.

It's great that you were able to dive in and construct the Julia representation!

I just wanted to clarify, by the 'function bodies', do you mean the chunks of python code that have been extracted from the Fortran code (i.e. the functions in cell [21] of the demo notebook?)

Also, I confess, I'm not totally sure what you mean by your last sentence - the reference scientific models can be in many languages, yes - but the GrFN representation should be fairly language-agnostic (the extracted lambda functions are of course Python expressions, but should in principle be restricted to elementary mathematical operations using infix operators, which should be fairly easy to translate to other languages with C-like DNA).

Also, we are looking into streamlining the representation by integrating the lambda functions into the GrFN representation - as a pickle file, the function could just be directly pickled, and I expect that when serializing to JSON, we could just use inspect to write out the corresponding Python source code (see #130 ).

jpfairbanks commented 5 years ago

I just wanted to clarify, by the 'function bodies', do you mean the chunks of python code that have been extracted from the Fortran code (i.e. the functions in cell [21] of the demo notebook?)

That is what I meant, when I wrote it. But I now think that I need the intermediate representation (IR) that you used to do the Fortran to Python transpiling. Because I need to write an IR -> Julia backend.

the extracted lambda functions are of course Python expressions, but should in principle be restricted to elementary mathematical operations using infix operators, which should be fairly easy to translate to other languages with C-like DNA).

The "elementary math operations with infix operators" could be stored in an IR that is then compiled to any programming language ie Python or Julia. It would be better to go Fortran -> IR -> {Python or Julia} than to go Fortran -> IR -> Python -> IR' -> Julia.

I'll comment further on #130 about including the IR in the grfn output.

It would be nice to have an implementation of the modeling representation in multiple languages since the models themselves will be in multiple languages.

By this I meant that it would be nice to have a GrFN "interpreter" that can integrate well with many languages. The scientists will be native speakers in one language and want to use GrFNs in their language for example stats oriented domains like psychology and sociology are more likely to use R than python and will want an R oriented way to interact with GrFNs. Whereas physicists are more likely to use Python and will want to use a python package like pip install griffin from griffin import GrFN. I of course want a Julia oriented implementation. If we end up with multiple implementations, we can all share the same data representation so that GrFNs can be shared across implementations.

In our future post ASKE world, I imagine scientists writing a script in their native language (R, python, julia, ...). Another scientist who wants to build on their work then runs it through AutoMates or SemanticModels.jl and getting out a common GrFN representation that they can manipulate to do model transformations, and then execute using any runtime.