Closed allanj closed 5 years ago
It also seems this implementation is specifically for images.
Because I saw the constructor of MPS has the following operation:
if label_site is None:
label_site = input_dim // 2
assert label_site >= 0 and label_site <= input_dim
Let me know if I'm wrong
Yes, a recurrent module is currently in development and should be included in the master branch soon. In the meantime, the TI_MPS class currently included in the dynamic_capacity branch uses many copies of a single repeated core tensor to evaluate an input sequence, and should provide the recurrent functionality you're looking for.
The input data to an MPS instance can be anything, as long as it has a fixed size. The original code was written with image data in mind, but there's nothing in the code which specializes to that case.
if label_site is None: label_site = input_dim // 2 assert label_site >= 0 and label_site <= input_dim
The input is processed in a linear fashion (since MPS is a linear data structure), and that code block just handles where in that linear sequence the output is generated. This output placement won't fundamentally change the expressivity of the MPS model, but could bias the model to be more sensitive to certain regions of the input neighboring the output site.
Just a follow-up that the recurrent TI_MPS class has been cleaned up, and can now be found in the master branch. Matching documentation should be uploaded soon!
Great. I will try it out.
If I understand the code correctly, this class takes input with size (batch_size, seq_length, input_dim)
and gives output size (batch_size, output_dim)
.
Is it possible to have another class with output size (batch_size, seq_length, output_dim)
?
Good question! That would technically be possible, but doesn't mix well with the structure of tensor networks. Although MPS are sequential models, the fact that all the operations are (multi-)linear makes their evaluation much more flexible than traditional RNN's. For example MPS are very parallelizable and can be evaluated in depth that's only logarithmic with the sequence length, something that becomes impossible once we start requiring copies of the hidden state vectors at each step of the evaluation (this is vaguely related to the no cloning theorem of quantum mechanics).
Because the underlying contraction methods assume this type of flexibility, it would require some significant changes to the code to implement the class you're talking about. I would like to rewrite the contraction engine once I have a more complete picture of what tasks users are applying TorchMPS towards, but right now I don't see a sequence-to-sequence model being developed anytime soon. Sorry!
Thanks. I have to learn more about this.
Do you plan to make this module recurrent such as
torch.nn.LSTM
? Just wondering if we can apply to this to RNN.