scikit-fusion is a Python module for data fusion and learning over heterogeneous datasets. The core of scikit-fusion are recent collective latent factor models and large-scale joint matrix factorization algorithms.
[News:] Fast CPU and GPU-accelerated implementatons of some of our methods.
[News:] Scikit-fusion, collective latent factor models, matrix factorization for data fusion and learning over hetnets.
[News:] fastGNMF, fast implementation of graph-regularized non-negative matrix factorization using Facebook FAISS.
scikit-fusion is tested to work under Python 3.
The required dependencies to build the software are Numpy >= 1.7, SciPy >= 0.12, PyGraphviz >= 1.3 (needed only for drawing data fusion graphs) and Joblib >= 0.8.4.
This package uses distutils, which is the default way of installing python modules. To install in your home directory, use:
python setup.py install --user
To install for all users on Unix/Linux:
python setup.py build
sudo python setup.py install
For development mode use:
python setup.py develop
Let's generate three random data matrices describing three different object types:
>>> import numpy as np
>>> R12 = np.random.rand(50, 100)
>>> R13 = np.random.rand(50, 40)
>>> R23 = np.random.rand(100, 40)
Next, we define our data fusion graph:
>>> from skfusion import fusion
>>> t1 = fusion.ObjectType('Type 1', 10)
>>> t2 = fusion.ObjectType('Type 2', 20)
>>> t3 = fusion.ObjectType('Type 3', 30)
>>> relations = [fusion.Relation(R12, t1, t2),
fusion.Relation(R13, t1, t3),
fusion.Relation(R23, t2, t3)]
>>> fusion_graph = fusion.FusionGraph()
>>> fusion_graph.add_relations_from(relations)
and then collectively infer the latent data model:
>>> fuser = fusion.Dfmf()
>>> fuser.fuse(fusion_graph)
>>> print(fuser.factor(t1).shape)
(50, 10)
Afterwards new data might arrive:
>>> new_R12 = np.random.rand(10, 100)
>>> new_R13 = np.random.rand(10, 40)
for which we define the fusion graph:
>>> new_relations = [fusion.Relation(new_R12, t1, t2),
fusion.Relation(new_R13, t1, t3)]
>>> new_graph = fusion.FusionGraph(new_relations)
and transform new objects to the latent space induced by the fuser
:
>>> transformer = fusion.DfmfTransform()
>>> transformer.transform(t1, new_graph, fuser)
>>> print(transformer.factor(t1).shape)
(10, 10)
scikit-fusion contains several applications of data fusion:
>>> from skfusion import datasets
>>> dicty = datasets.load_dicty()
>>> print(dicty)
FusionGraph(Object types: 3, Relations: 3)
>>> print(dicty.object_types)
{ObjectType(GO term), ObjectType(Experimental condition), ObjectType(Gene)}
>>> print(dicty.relations)
{Relation(ObjectType(Gene), ObjectType(GO term)),
Relation(ObjectType(Gene), ObjectType(Gene)),
Relation(ObjectType(Gene), ObjectType(Experimental condition))}