nke001 / causal_learning_unknown_interventions

Code for "Neural causal learning from unknown interventions"
100 stars 18 forks source link

About C code conversion to Python #2

Open EternityZY opened 2 years ago

EternityZY commented 2 years ago

Your work is very instructive, but the C part of the code can't be debugged. I want to learn about the function : causal._causal.sample_mlp(self._np_W0gt, self._np_B0gt, self._np_W1gt, self._np_B1gt, self.N, self.gammagt.numpy(), out=s, alpha=0.1) (located in line 384 under the models.py folder)

and another function : causal._causal.logprob_mlp(self._np_W0slow, self._np_B0slow, self._np_W1slow, self._np_B1slow, self.N, block,sample,config,logp, alpha=0.1,temp=self.a.temperature) (located in line 401 under the models.py folder). Are these two functions implemented in Python? Or could you please describe the specific process of these two algorithms in detail.

obilaniu commented 2 years ago

@EternityZY I am the author of the C AVX2-vectorized code.

It is first necessary to understand that in our work, causal networks are parametrized as stacked MLPs that compute a categorical distribution. This categorical distribution is either sampled from (sample_mlp()), or used to calculate log-probabilities and minimize cross-entropy (logprob_mlp).

The problem size, and corresponding MLP stack, is defined by the following hyperparameters:

The MLP stack is parametered with the following arrays:


For each causal variable i from 0 to M-1, a single computation of the categorical distribution's parameters is roughly as follows:

def calclogits(i, sample, config)
    # LAYER 0
    h = self.B0[i, :]
    for j in parents(i, config):
        c = sample[j] # Categorical value of parent j of i. Integer, in range [0 .. N[j]-1]
        # W0 is pre-stacked along axis 1 because not all variables have the same number of categories.
        # So need to add Nc[j] in order to find the offset of variable j's categorical value c's *embedding* 
        offj = c + self.Nc[j]
        h += self.W0[i, offj, :]   # Sum into h the embedding corresponding to parent j's value c

    # LEAKY RELU
    h = leaky_relu(h, alpha=0.1)

    # LAYER 1
    # W1 and B1 are pre-stacked along axis 0 because not all variables have the same number of categories.
    # Slice out of the stack the weights for variable i and use them.
    offi = self.Nc[i]
    B1i  = self.B1[offi:offi+self.N[i], :] # Shape: (N[i],)
    W1i  = self.W1[offi:offi+self.N[i], :] # Shape: (N[i], H)
    outi = einsum('nh,h->n', W1i, h) + B1i # Shape: (N[i],)
    return logsoftmax(outi)  # Return shape: (N[i])

For sampling (sample_mlp()), with the normalized logits returned by calclogits(i, sample, config), we sample a categorical value for variable i. We assume that config is lower-triangular and iterate i from 0 to M-1. If config is lower-triangular, then necessarily all variables are in topologically-sorted order and iterating like this is causally-ordered.

for i in range(M):
    sample[i] = categorical(l=calclogits(i, sample, config))
return sample

For log-probability and backprop (logprob_mlp()), rather than sampling, we return one log-probability per variable, and neither need nor assume causal ordering:

logprobs = empty(M)
for i in range(M):
    l = calclogits(i, sample, config)
    logprobs[i] = l[sample[i]] # Because we did logsoftmax!
return logprobs

The actual C code is a lot more complicated because it is vectorized and batched. I wrote it because it's extremely slow to run this in PyTorch.

EternityZY commented 2 years ago

Thank you very much for your reply. You explained every detail of the question very well. Thank you for your help!