general question: multi-modal dataset -> consider graphs not for all modalities

thomasmooon commented 5 years ago

Hello,

merging data and it's graph (explicitly given or created through an embedding e. g. as in the offical example Graph regularization for sentiment classification using synthesized graphs) is as easy like that

graph_reg_model = nsl.keras.GraphRegularization(base_reg_model, graph_reg_config)

However, what if I'd like to apply the graph(-induced constraints) only partially? Example:

example 1:

Here's an example. The dataset has several modalities, e. g.

modality1 comprises features like AGE, GENDER, REGION, INCOME, DISEASE
modality2 is a DISEASE-network (like a symmetric matrix with values according to the degree of association strength)

Speaking TF: The base_reg_modeland the graph_reg_configmodels could be defined like in step 1. and 2.. But obviously point 3 graph_reg_model is inconsistent because base_reg_model+ graph_reg_configdon't know how to talk to each other.

base_reg_model = tf.keras.Model(inputs=modality1, outputs=outputs)
graph_reg_config = nsl.configs.GraphRegConfig(derived from modality2)
graph_reg_model = nsl.keras.GraphRegularization(base_reg_model, graph_reg_config)

example 2:

What if

modality1 would be like _AGE, GENDER, REGION, INCOME, DISEASE_1, ..., DISEASEN
modality2 would be a DISEASE-Network of N diseases

Is it possible (how?) to expand the graph with a dummy graph? Maybe in the way that the 4 covariates AGE, ..., INCOME are considered like in 2.2? 2.1 graph_DISEASE= nsl.configs.GraphRegConfig(derived from modality2) 2.2 graph_dummy= nsl.configs.GraphRegConfig(a fully connected, unconstraint graph with 4 nodes)

graph_reg_model = nsl.keras.GraphRegularization(base_reg_model, [graph_dummy, graph_DISEASE])

Thanks in advance.

arjung commented 5 years ago

Thanks for the question, @thomasmooon! First, I want to mention that nsl.configs.GraphRegConfig is not really specific to a modality or set of modalities. It is just a class that holds some configuration parameters for graph regularization such as the number of neighbors to consider, the multiplier to use, and so forth.

The general inputs to graph-based NSL are as follows:

Example features (from all modalities of interest)`
Graph in TSV format (effectively just a set of edges describing connectivity between pairs of examples)

From my understanding, your primary entities are humans with corresponding modalities (age, gender, income, DISEASE_i, etc), and you have graph signals (association matrix) for the DISEASE modality, and not for humans directly. So, I think you may want to write your own custom graph building utility that will build a graph based on the modality (/modalities) of interest, which in your case is 'DISEASE'.

This utility will consider pairs of examples in the data set, compute connectivity info in the resulting graph, and encode them as edges in TSV format. For instance, you may use the association strength as a proxy for edge weight in the resulting graph. See our default graph building utility for an example of how this is done using embedding similarity as the distance function.

Once such a graph is built, you can combine the graph with the example features using the pack_nbrs tool.

Hope this clarifies your questions.

thomasmooon commented 5 years ago

Thank you very much @arjung , this is of great utility. I try this out and see how far I'll get. I'll give some feedback (but expect that this will take a while).

arjung commented 5 years ago

Sounds good. Yes, please feel free to give us feedback/suggest improvements.

tensorflow / neural-structured-learning