Complex Structure example

hajipoor commented 4 years ago

Hi, Can the current version create a similar SPN network, For example how create similar G product nodes that is connected via two edges to an M nodes or how connect B node and an indicator variable y. SPN

I would be very thankful if you guide me to implement this network.

Thanks.

jostosh commented 4 years ago

Hi,

I think this is from Cheng et al 2014, am I right? It's also discussed in this lecture by Pascal Poupart.

I think the first layer could be implemented with a TextVectorization layer as it simply yields word embeddings using some one-hot token representation. This will basically give you all the H nodes.

There are a few details that the original paper however does not clarify well, which makes it difficult for me to explain how to build the remainder of the network. I'd suggest raising an issue in the original repo of Cheng to clarify a few things, for example:

Are the weights w_11 until w_dk of the first layer normalized (i.e. are. they normalized sum nodes or at least non-negative)?
What's the value of N, D and K (in the figure)?

Hope they can help you out a bit!

Let me know if you know more about this and I can proceed to explain how it could work for libspn-keras.

hajipoor commented 4 years ago

Thank you for your explanation, good Idea, I will contact paper authors to get more details.

My main question is that how implement M, G or S nodes, for example:

How I can feed the output of Sum nodes(M) into G nodes (Product nodes that have two same edges)
How I can feed inputs(Y) into Product nodes (S) - it seems that we have to use functional Keras, am I right?

Please give me a simple tutorial about how to implement a similar structure with libspn-keras .

Available examples are related to image domain with continuous inputs , I promise that I prepare a Jupyter for this LM network as a NLP task to share with others.

Thanks again.

hajipoor commented 4 years ago

Jostosh, I found value of N , D and K in their source code. Number of nodes in first, second, third, fourth, fifth and sixth layer is 400, 10000, 100000, 10000, 10000 and 3 respectively.

jostosh commented 4 years ago

In the figure, I only see one occurrence of N, D, and K so it's difficult to know how those layers should be stacked if there are more than is displayed here.

The SPN displayed in the figure also seems to violate both completeness and decomposability.

The H layer is still complete assuming that a word is modeled as a discrete variable (it is complete because the sums are connected to children with the same scope).
However, the next layer with the M nodes is no longer complete (sums are connected with children with different scopes).
The product nodes that follow (G) square the nodes in M, which means they violate decomposability (since for decomposability product children need to have disjoint scopes).

We can still build an SPN that follows this architecture, but libspn-keras is designed to build SPNs that are complete and decomposable. So it needs a bit more thought. I might have time in a few days to look at it more closely. Apologies for the delay.

Please also let me know if I'm wrong about my claims on decomposability and completeness.

hajipoor commented 4 years ago

I agree with you and it seems your claims are right.

Thank you for your time and I hope you will find a tricky solution to implement it with libspn-keras.

hajipoor commented 4 years ago

Does libSPN (not Keras version) capable to implement it? I mean are complete and decomposable required for libSPN as well?

jostosh commented 4 years ago

Apologies if I wasn't yet clear. Completeness and decomposability are not necessarily strong requirements for libspn-keras, so it is possible to build this architecture with this library. It's just that it takes a bit more effort to get the nodes aligned correctly along each axis in the tensors.

An implementation with libspn-keras will most likely be more manageable and efficient than an implementation with libspn. I will be able to have a look at this tomorrow after 18.00 CEST.

hajipoor commented 4 years ago

Good. Did you take the time to think about this? Did you find any solutions

jostosh commented 4 years ago

Hi, apologies for the delay. If I were to re-implement the SPN from the figure, it would be something like this:

import tensorflow as tf
import libspn_keras as spnk

max_tokens = 10000  # Depends on vocabulary
num_sum_h_layer = 100   # Also referred to as K
sequence_len = 5   # Also referred to as N
num_sums_m_layer = 16  # Also referred to as K

spn = tf.keras.Sequential([
  tf.keras.layers.experimental.preprocessing.TextVectorization(output_sequence_length=sequence_len),
  tf.keras.layers.Embedding(input_dim=max_tokens, output_dim=num_sum_h_layer),
  tf.keras.layers.Reshape([1, 1, num_sum_h_layer * sequence_len]),
  spnk.layers.DenseSum(num_sums=num_sums_m_layer),
  tf.keras.layers.Reshape([num_sums_m_layer, 1, 1]),
  tf.keras.layers.Lambda(lambda s: tf.concat([s, s + s], axis=-1), name="add_g_nodes"),
  spnk.layers.DenseSum(num_sums=1),  # Just one sum per pair of G and M nodes
  tf.keras.layers.Reshape([1, 1, num_sums_m_layer]),
  spnk.layers.RootSum(return_weighted_child_logits=True)
])

spn(tf.convert_to_tensor(["hello there, how are you"]))

spn.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))

spn.summary()

Which prints:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
text_vectorization (TextVect (None, 5)                 0         
_________________________________________________________________
embedding (Embedding)        (None, 5, 100)            1000000   
_________________________________________________________________
reshape (Reshape)            (None, 1, 1, 500)         0         
_________________________________________________________________
dense_sum (DenseSum)         (None, 1, 1, 16)          8000      
_________________________________________________________________
reshape_1 (Reshape)          (None, 16, 1, 1)          0         
_________________________________________________________________
add_g_nodes (Lambda)         (None, 16, 1, 2)          0         
_________________________________________________________________
dense_sum_1 (DenseSum)       (None, 16, 1, 1)          32        
_________________________________________________________________
reshape_2 (Reshape)          (None, 1, 1, 16)          0         
_________________________________________________________________
root_sum (RootSum)           (None, 16)                16        
=================================================================
Total params: 1,008,048
Trainable params: 1,008,048
Non-trainable params: 0
_________________________________________________________________

The embedding just takes a token index and maps that to a vector, which basically corresponds to multiplying a one-hot vector (such as the one in the figure) with a matrix of weights. The axis at index 1 is meant for scopes, whereas the SPN in the figure doesn't always seem to care about scopes/decomposability/completeness. Whenever I have to limit sums to be connected to only a subset of nodes in a layer, I reshape it so that groups of nodes end up in different locations in the tensor at axis 1.

I'm not sure how the authors actually stacked layers and how the exact parameter settings were. Also keep in mind that if you want to use embedding layers etc., that you're not really computing probabilities, unless you make sure that the embedding weights are 'normalized' in the same way that sums are normalized in a normalized SPN. The authors of the paper don't specify they do so, so I also left it out here.

Hopefully this gets you going.

hajipoor commented 4 years ago

Many thanks for proving this code, it gives me good idea how to implement LM network and also I read your articles (1 ,2 ) to get to know this network more deeply.

As you know Interpretability and Tractability are the main features of this network , I am interested to know how this package support these features and give us information about them.

Thanks Again

jostosh commented 4 years ago

Ah nice to hear you've read those! Although those articles are still mentioning the old libspn, the same ideas apply in this library

jostosh commented 4 years ago

I'm not sure exactly what you mean with interoperability and traceability in this context?

hajipoor commented 4 years ago

Sorry there were some typos in my comment. As you know, inference in SPNs is guaranteed to be tractable, I am interested to know how libspn-keras provides this information and how I can have them.

jostosh commented 4 years ago

Tractability relates to the complexity order of doing inferences. In the case of SPNs, inferences are computed in linear time w.r.t. the network size (i.e. this is O(n), already stated in Poon and Domingos, 2011). Although there is no formal definition of what is 'tractable', usually a distribution is said to be tractable if it can be computed in at most polynomial time. Since O(n) <= O(n^k) for k > 0, computing evidence/probabilities in SPNs are tractable inferences.

In other words, this is not so much a property of the implementation, but rather a property of the model itself.

An SPN is interpretable in the sense that any node in it computes a probability. In other words, there is always a probabilistic interpretation of the values it computes. Again this is a property of the model and not one that has to be enforced by the implementation.

Hope that makes it a bit clearer.

If you have more questions that are not related to the language-model structure explained above, please open a new issue :)

pronobis / libspn-keras

Complex Structure example #16