paulws / Banish

Banish
1 stars 2 forks source link

Banish Data formats #3

Open amb-enthusiast opened 10 years ago

amb-enthusiast commented 10 years ago

Hi team,

Just wanted to flag up requirements for file IO features in Banish.

Existing efforts

SAMIAM reads and writes several file formats: http://reasoning.cs.ucla.edu/samiam/iframe/fileformats.html

But, it depends on the smile library to do this. The Genie/Smile project gives details of support for a range of formats: http://genie.sis.pitt.edu/wiki/Elements_of_GeNIe:_File_formats_suported_in_GeNIe

This - somewhat old - project looks promising http://www.kddresearch.org/Groups/Probabilistic-Reasoning/convertor.html

Whereas this looks like another great standards initiative: http://www.cs.cmu.edu/~fgcozman/Research/InterchangeFormat/ Nothing like XML to bloat file size with tag text...

Requirements

I've mostly worked with SAMIAM and Kevin Murphy's BNT to date, but plan on using Genie/Smile. And of course, Banish.

In the case of SAMIAM, I've used the default Hugin file formats (.net and .dat) text based format. I have a set of models that I plan to enrich & refine.

The BNT will load BNIF files http://bayesnet.github.io/bnt/docs/usage.html#file

I have a smaller set of models for BNT, but still plan to improve the models over time. BNT is used for its learning algos that go beyond those offered in SAMIAM.

I guess I would like Banish to handle (read & write) to Hugin, BNIF and in anticipation of future work, the Genie/Smile DSL.

The BBN conversion tool deals with all these cases, and as such, could be a useful starting point.

Does this fit with current plans for Banish?

paulws commented 10 years ago

Hi Ant,

Banish currently uses its own format of data. We were asked to consider at the startup meeting whether a mongodb compatible data format could be used (which I'll have to read up on as I have no prior experience of it). There is currently no requirements for Banish to handle other data formats.

We do however want Banish to be used and to be as useful as possible, so we can certainly add your requirements to a list of potential upgrades for the future.

Paul

amb-enthusiast commented 10 years ago

Hi Paul,

MongoDB offers a document-oriented store, very closely aligned with JSON format, with indexes operating on document properties to enable rapid search/querying. It is a really exciting technology, and I've been using it on a few projects.

One option would be to store a BN as a MongoDB document, in a form that closely resembles the .net format; something like:

{
   modelAuthors : ["Ant" , "Paul"] ,
   modelTitle : "MyFirstModel" ,
   modelLastEdited : "2014-03-01T13:32:01GMT" ,
   modelDescription : "A toy example" ,
   modelMetadata : {
        accessControls : ["group1" , "group2" ] ,
        modelSummaryStats : {
            totalNodes : 2 ,
            meanInDegree : 0.5
         }
        } ,
   nodes : [ 
        { name : "A" , values : ["a0" , "a1"]} ,
        { name : "B" , values : [ "b0" , "b1" , "b2"] }
        ] ,
    potentials : [
        {
        name : "P_A" ,
        nodes : [ "A" ] ,
        values : [ 0.33 , 0.67 ]
        } ,
        {
             name : "P_B" ,
             nodes : [ "B" ] ,
             values : [ 0.55 , 0.3 , 0.15 ]
         } ,
         {
             name : "P_A | B" ,
             nodes : [ "A" , "B" ] ,
             values : [ 0.333 , 0.667 , 0.25 , 0.75 , 0.55 , 0.45 ]
         } 
     ]
}

I doubt that this is optimal, but it could be a quick way to support easy conversion between Banish format and widely used existing formats.

paulws commented 10 years ago

Thanks Ant. That's very helpful. I'll definitely produce something along the line you suggest.

amb-enthusiast commented 10 years ago

Glad I could help!