Closed hajipoor closed 4 years ago
Hi,
I think this is from Cheng et al 2014, am I right? It's also discussed in this lecture by Pascal Poupart.
I think the first layer could be implemented with a TextVectorization
layer as it simply yields word embeddings using some one-hot token representation. This will basically give you all the H
nodes.
There are a few details that the original paper however does not clarify well, which makes it difficult for me to explain how to build the remainder of the network. I'd suggest raising an issue in the original repo of Cheng to clarify a few things, for example:
N
, D
and K
(in the figure)?Hope they can help you out a bit!
Let me know if you know more about this and I can proceed to explain how it could work for libspn-keras
.
Thank you for your explanation, good Idea, I will contact paper authors to get more details.
My main question is that how implement M, G or S nodes, for example:
Please give me a simple tutorial about how to implement a similar structure with libspn-keras
.
Available examples are related to image domain with continuous inputs , I promise that I prepare a Jupyter for this LM network as a NLP task to share with others.
Thanks again.
Jostosh, I found value of N
, D
and K
in their source code.
Number of nodes in first, second, third, fourth, fifth and sixth layer is 400, 10000, 100000, 10000, 10000 and 3 respectively.
In the figure, I only see one occurrence of N
, D
, and K
so it's difficult to know how those layers should be stacked if there are more than is displayed here.
The SPN displayed in the figure also seems to violate both completeness and decomposability.
H
layer is still complete assuming that a word is modeled as a discrete variable (it is complete because the sums are connected to children with the same scope). M
nodes is no longer complete (sums are connected with children with different scopes). G
) square the nodes in M
, which means they violate decomposability (since for decomposability product children need to have disjoint scopes). We can still build an SPN that follows this architecture, but libspn-keras
is designed to build SPNs that are complete and decomposable. So it needs a bit more thought. I might have time in a few days to look at it more closely. Apologies for the delay.
Please also let me know if I'm wrong about my claims on decomposability and completeness.
I agree with you and it seems your claims are right.
Thank you for your time and I hope you will find a tricky solution to implement it with libspn-keras
.
Does libSPN (not Keras version) capable to implement it? I mean are complete and decomposable required for libSPN as well?
Apologies if I wasn't yet clear. Completeness and decomposability are not necessarily strong requirements for libspn-keras
, so it is possible to build this architecture with this library. It's just that it takes a bit more effort to get the nodes aligned correctly along each axis in the tensors.
An implementation with libspn-keras
will most likely be more manageable and efficient than an implementation with libspn
. I will be able to have a look at this tomorrow after 18.00 CEST.
Good. Did you take the time to think about this? Did you find any solutions
Hi, apologies for the delay. If I were to re-implement the SPN from the figure, it would be something like this:
import tensorflow as tf
import libspn_keras as spnk
max_tokens = 10000 # Depends on vocabulary
num_sum_h_layer = 100 # Also referred to as K
sequence_len = 5 # Also referred to as N
num_sums_m_layer = 16 # Also referred to as K
spn = tf.keras.Sequential([
tf.keras.layers.experimental.preprocessing.TextVectorization(output_sequence_length=sequence_len),
tf.keras.layers.Embedding(input_dim=max_tokens, output_dim=num_sum_h_layer),
tf.keras.layers.Reshape([1, 1, num_sum_h_layer * sequence_len]),
spnk.layers.DenseSum(num_sums=num_sums_m_layer),
tf.keras.layers.Reshape([num_sums_m_layer, 1, 1]),
tf.keras.layers.Lambda(lambda s: tf.concat([s, s + s], axis=-1), name="add_g_nodes"),
spnk.layers.DenseSum(num_sums=1), # Just one sum per pair of G and M nodes
tf.keras.layers.Reshape([1, 1, num_sums_m_layer]),
spnk.layers.RootSum(return_weighted_child_logits=True)
])
spn(tf.convert_to_tensor(["hello there, how are you"]))
spn.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))
spn.summary()
Which prints:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
text_vectorization (TextVect (None, 5) 0
_________________________________________________________________
embedding (Embedding) (None, 5, 100) 1000000
_________________________________________________________________
reshape (Reshape) (None, 1, 1, 500) 0
_________________________________________________________________
dense_sum (DenseSum) (None, 1, 1, 16) 8000
_________________________________________________________________
reshape_1 (Reshape) (None, 16, 1, 1) 0
_________________________________________________________________
add_g_nodes (Lambda) (None, 16, 1, 2) 0
_________________________________________________________________
dense_sum_1 (DenseSum) (None, 16, 1, 1) 32
_________________________________________________________________
reshape_2 (Reshape) (None, 1, 1, 16) 0
_________________________________________________________________
root_sum (RootSum) (None, 16) 16
=================================================================
Total params: 1,008,048
Trainable params: 1,008,048
Non-trainable params: 0
_________________________________________________________________
The embedding just takes a token index and maps that to a vector, which basically corresponds to multiplying a one-hot vector (such as the one in the figure) with a matrix of weights. The axis at index 1 is meant for scopes, whereas the SPN in the figure doesn't always seem to care about scopes/decomposability/completeness. Whenever I have to limit sums to be connected to only a subset of nodes in a layer, I reshape it so that groups of nodes end up in different locations in the tensor at axis 1.
I'm not sure how the authors actually stacked layers and how the exact parameter settings were. Also keep in mind that if you want to use embedding layers etc., that you're not really computing probabilities, unless you make sure that the embedding weights are 'normalized' in the same way that sums are normalized in a normalized SPN. The authors of the paper don't specify they do so, so I also left it out here.
Hopefully this gets you going.
Many thanks for proving this code, it gives me good idea how to implement LM network and also I read your articles (1 ,2 ) to get to know this network more deeply.
As you know Interpretability
and Tractability
are the main features of this network , I am interested to know how this package support these features and give us information about them.
Thanks Again
Ah nice to hear you've read those! Although those articles are still mentioning the old libspn
, the same ideas apply in this library
I'm not sure exactly what you mean with interoperability
and traceability
in this context?
Sorry there were some typos in my comment.
As you know, inference in SPNs is guaranteed to be tractable, I am interested to know how libspn-keras
provides this information and how I can have them.
Tractability relates to the complexity order of doing inferences. In the case of SPNs, inferences are computed in linear time w.r.t. the network size (i.e. this is O(n)
, already stated in Poon and Domingos, 2011). Although there is no formal definition of what is 'tractable', usually a distribution is said to be tractable if it can be computed in at most polynomial time. Since O(n) <= O(n^k)
for k > 0
, computing evidence/probabilities in SPNs are tractable inferences.
In other words, this is not so much a property of the implementation, but rather a property of the model itself.
An SPN is interpretable in the sense that any node in it computes a probability. In other words, there is always a probabilistic interpretation of the values it computes. Again this is a property of the model and not one that has to be enforced by the implementation.
Hope that makes it a bit clearer.
If you have more questions that are not related to the language-model structure explained above, please open a new issue :)
Hi, Can the current version create a similar SPN network, For example how create similar G product nodes that is connected via two edges to an M nodes or how connect B node and an indicator variable y.
I would be very thankful if you guide me to implement this network.
Thanks.