Roadmap - Githubissues

hpoit commented 7 years ago

Representations

Symbols
Consolidated MLN clauses
Data schema

MLN tasks

MRF partitioning, a technique that can result in dramatically improved result quality
MAP inference, to find the most likely possible world
Marginal inference, to estimate marginal probabilities
Weight learning, to learn the weights of MLN rules given training data
Rule learning, with reinforcement learning

Functionalities

Prolog/Datalog: In addition to MLN rules, to also execute logical rules (embed engine through C)
Functions: a library of common numeric/string/boolean functions, which can be used inside an MLN rule. In particular, to perform arithmetic manipulation and comparison in MLN rules.
Predicate scoping: Sometimes even grounding the atoms of one predicate will blow up RAM. On the other hand, it’s often the case that you only care about a particular subset of the exhaustive set of ground atoms. This feature allows to explicitly specify the atoms you are interested in so that your program becomes runnable again.

Distribution

Task decomposition, by separating complex tasks into subtasks with specializing
Data partitioning, by automatic parallelization of complex statistical tasks

hpoit commented 7 years ago

Hi @tawheeler, @sbromberger. FYI, from a functional perspective, this is what I just sketched up

Tuffy Program Goal: scale relational operations during grounding phase of MLN inference through RDBMS Method: use hybrid solution of RDBMS-based grounding and in-memory search Method: use partitioning to further improve space and time efficiency of MLN

General Functionalities 1a. Symbol table to convert all logic constants into integer IDs 1b. Consolidate MLN clauses of same pattern 1c. PostgreSQL to store input and intermediate data, e.g. ground Markov network object 2a. Efficient grounding of SQL queries (RDBMS-based) with KBMC and 2b. Lazy reference (in-memory search) for MLN formula grounding resulting in Markov random field 2c. Partitioning and inferring on MRF 3a. MAP inference with WalkSAT 3b. Marginal inference with MC-SAT

Discriminative weight learning with Diagonal Newton (Lowd and Domingos) Result: scalability of MLN inference and of grounding phase

Felix Program Goal: efficiently inference in Markov Logic through common subtasks in text-processing tasks. Method: use specialized algorithms for each task

General Functionalities

Algorithms CC, LR, Tuffy
Compiler to find (or use?) algos automatically
Data-movement optimizer built into an RDBMS Result: scale to complex information extraction programs on large datasets and generate results with higher quality than state-of-the-art IE approaches.

I am counting on you guys for a peer review from time to time as I move forward. Thank you.

tawheeler commented 7 years ago

Hi @hpoit. This is very MLN-specific so I can't really say whether it is a good approach or not. I would recommend against implementing things like SQL backends before you have a very basic version of everything else working first.

I'd start with:

using LightGraphs.jl to represent a graph
have a vector of clauses, one for each node in the graph
have some initially simple representation for the clauses
make sure you can evaluate the MLN and do the basic stuff you need to do (basic inference, basic weight learning, etc.)

hpoit / MLN.jl

Roadmap #1