Open EternityZY opened 2 years ago
@EternityZY I am the author of the C AVX2-vectorized code.
It is first necessary to understand that in our work, causal networks are parametrized as stacked MLPs that compute a categorical distribution. This categorical distribution is either sampled from (sample_mlp()
), or used to calculate log-probabilities and minimize cross-entropy (logprob_mlp
).
The problem size, and corresponding MLP stack, is defined by the following hyperparameters:
M
: Number of causal variables.N
: Array with number of categories in each variable. For example, eight Boolean variables would be [2,2,2,2,2,2,2,2]
.Ns
: sum(N)
, the sum of all variables' number of categories.Nc
: cumsum(N)
, the cumulative-sum array of number of categories, starting with zero. For eight boolean variables, would be [0,2,4,6,8,10,12,14]
. Nc[-1] + N[-1] == Ns
by definition.Hgt
/Hlr
/H
: The "embedding size" of categorical variables' discrete values in the ground-truth and learned models.The MLP stack is parametered with the following arrays:
W0
: Stack of "embedding" matrices. Shape (M,Ns,H)
.B0
: Stack of biases for hidden layer. Shape (M,H)
.W1
: Stack of weights for output layer. Shape (H,Ns)
.B1
: Stack of biases for output layer. Shape (Ns,)
.config
: Adjacency matrix. Shape (M,M)
.alpha=0.1
and was chosen arbitrarily.For each causal variable i
from 0 to M-1
, a single computation of the categorical distribution's parameters is roughly as follows:
def calclogits(i, sample, config)
# LAYER 0
h = self.B0[i, :]
for j in parents(i, config):
c = sample[j] # Categorical value of parent j of i. Integer, in range [0 .. N[j]-1]
# W0 is pre-stacked along axis 1 because not all variables have the same number of categories.
# So need to add Nc[j] in order to find the offset of variable j's categorical value c's *embedding*
offj = c + self.Nc[j]
h += self.W0[i, offj, :] # Sum into h the embedding corresponding to parent j's value c
# LEAKY RELU
h = leaky_relu(h, alpha=0.1)
# LAYER 1
# W1 and B1 are pre-stacked along axis 0 because not all variables have the same number of categories.
# Slice out of the stack the weights for variable i and use them.
offi = self.Nc[i]
B1i = self.B1[offi:offi+self.N[i], :] # Shape: (N[i],)
W1i = self.W1[offi:offi+self.N[i], :] # Shape: (N[i], H)
outi = einsum('nh,h->n', W1i, h) + B1i # Shape: (N[i],)
return logsoftmax(outi) # Return shape: (N[i])
For sampling (sample_mlp()
), with the normalized logits returned by calclogits(i, sample, config)
, we sample a categorical value for variable i
. We assume that config
is lower-triangular and iterate i
from 0
to M-1
. If config
is lower-triangular, then necessarily all variables are in topologically-sorted order and iterating like this is causally-ordered.
for i in range(M):
sample[i] = categorical(l=calclogits(i, sample, config))
return sample
For log-probability and backprop (logprob_mlp()
), rather than sampling, we return one log-probability per variable, and neither need nor assume causal ordering:
logprobs = empty(M)
for i in range(M):
l = calclogits(i, sample, config)
logprobs[i] = l[sample[i]] # Because we did logsoftmax!
return logprobs
The actual C code is a lot more complicated because it is vectorized and batched. I wrote it because it's extremely slow to run this in PyTorch.
Thank you very much for your reply. You explained every detail of the question very well. Thank you for your help!
Your work is very instructive, but the C part of the code can't be debugged. I want to learn about the function : causal._causal.sample_mlp(self._np_W0gt, self._np_B0gt, self._np_W1gt, self._np_B1gt, self.N, self.gammagt.numpy(), out=s, alpha=0.1) (located in line 384 under the models.py folder)
and another function : causal._causal.logprob_mlp(self._np_W0slow, self._np_B0slow, self._np_W1slow, self._np_B1slow, self.N, block,sample,config,logp, alpha=0.1,temp=self.a.temperature) (located in line 401 under the models.py folder). Are these two functions implemented in Python? Or could you please describe the specific process of these two algorithms in detail.