harvardnlp / TextFlow

MIT License
116 stars 15 forks source link

On self.m_k in MaskedHiddenLayer #6

Closed hyunjik11 closed 4 years ago

hyunjik11 commented 4 years ago

In MADE used in AFLayer, it seems as though hidden_order='sequential' always, and so the m_k values of the hidden layers are arange(0, data_dim).repeat(...), so taking values in {0,...,data_dim-1} : https://github.com/harvardnlp/TextFlow/blob/0eb8552ae5b370f69e440bca6c6c226b1734c1ba/MADE.py#L38-L39 Also input_order=arange(data_dim) + 1: https://github.com/harvardnlp/TextFlow/blob/0eb8552ae5b370f69e440bca6c6c226b1734c1ba/flows.py#L275 As it is now, the hidden units i at the first hidden layer with m_k[i] = 0 will be unconnected, since previous_m_k = input_order = arange(data_dim) + 1, taking values in {1,...,data_dim}. c.f. how masking is defined: https://github.com/harvardnlp/TextFlow/blob/0eb8552ae5b370f69e440bca6c6c226b1734c1ba/MADE.py#L44 Should the m_k values of the hidden layers be arange(1, data_dim).repeat(), so that they take values in {1,...,data_dim-1} and you don't get unconnected hidden units? This also seems to be what the MADE paper suggests. I think this is a minor issue, as these unconnected units will be ignored anyways, but think it'd be good to remove redundancy and make the code clearer by aligning it with the method as presented in the original paper.

zackziegler95 commented 4 years ago

This is intended, as it provides a natural way to incorporate conditional information. This code comes from more general-purpose flow code, and it's best to keep it as general as possible.

hyunjik11 commented 4 years ago

Oh I see, the cond_inp are represented as zeros in previous_m_k, so the hidden units i with m_k[i]=0 will be connected to the conditional input. Thanks for the clarification.