Reusing tensornetworks - Githubissues

Bonnevie commented 5 years ago

As a new user, the mix of purely functional and object-oriented approaches is confusing me a bit.

Numpy and Tensorflow (the backends) are mostly functional, but running any sort of contraction in TensorNetwork modifies the network in-place and invalidates references to the nodes being contracted.

I can think of some reasons why it works this way, but keeping track of nodes and edges (i.e. keeping track of contraction trees) is made quite complicated by this, and it also adds quite a bit of overhead to computing tensor environments or multiple subnetworks. Is there any way around this? Or do I just have to live with rerunning the constructor?

Aside from this minor gripe, I am really excited for this package, thank you for the great work.

chaserileyroberts commented 5 years ago

I'm open to suggestions on improving the library! I really want to make it more functional, but I don't know of any "clean" way of building a graph in a functional way.

Modifying the graph object in place is definitely one of of my biggest gripes with it too (And I wrote it!), But again, I'm not sure what approach would be better.

chaserileyroberts commented 5 years ago

I think I can explain the confusing difference.

So in Numpy/Tensorflow, the base objects (np.array, tf.Tensor) at immutable objects, where as base objects (Node, Edge, TensorNetwork) are fully mutable, and can change state depending upon actions taken by neighbor objects.

But again, I'm not sure there is a better way since TensorNetworks are inherently mutable structures. If you can think of a design that allows for the construction/contraction of a TN using immutable objects, please let me know!

Bonnevie commented 5 years ago

I've been mulling it over since posing the question. Bear in mind that I'm not fully aware of the current implementation outside of the user-facing parts.

Contractions currently seem to replace the two nodes being contracted with a new node containing the contracted super-node, so the original nodes are disconnected from the network and the super-node is added in in their place. If we consider contractions as operations on tensor network objects, wouldn't it make sense for them to return a new tensor network instance with the modified node/edge list while preserving the original instance and its structure?

Edge and node objects should ideally be shared across the tensor networks, which requires that their specification does not require references to the network they are part of. The most tricky thing here seems to be that the edges connecting the contracted super-node to the rest of the network are shared with the original pre-contraction network, but the edges connect to different nodes. Alternatively, the edges could be part of the network specification and thus not be shared while the nodes are maintained.

The contractions could in theory be made reversible by letting the node introduced through contraction be a member of a new SubNetwork/SuperNode class that works like a node and has a tensor specification, but maintains a reference to a tensor network model that decomposes the tensor (e.g. the network of the two contracted nodes for a regular edge contraction).

MichaelMarien commented 5 years ago

I do agree that letting contractions return a new tensor network instance makes sense. Specifying nodes and edges without references to the network might require big changes to the API, cannot really just without looking deeper.

Bonnevie commented 5 years ago

The node implementation seems like it could be disconnected from the network in a fairly pain-free manner (based on a cursory look). It does carry a reference to the tensor network, but it's only used for

accessing the backend. This could be supplied independently.
validating that the arguments of the @ operator are from the same network. Since contractions only make sense within the context of a tensor network, this should probably be removed. I think another issue suggested restricting it to only work inside a context manager?
it's also used indirectly within the __xor__ implementation of class Edge. Edge does not itself reference or use the tensor network outside of this.

One use case for a redesign like this could be an implementation of the multienv algorithm (https://arxiv.org/abs/1310.8023) which allows the computation of the environment tensors of all tensors at small cost, but which I think would be difficult to implement at the moment since you need to reuse partial contractions (although it's probably possible with enough bookkeeping).

MichaelMarien commented 5 years ago

Defining nodes and edges without an underlying network would also imply an API change that solves #188 and #174. New API would be more explicit in the underlying network.

chaserileyroberts commented 5 years ago

This think about this for the user perspective. How would doing something simple like

net = TensorNetwork()
a = net.add_node(np.ones((2, 2)))
b = net.add_node(np.ones((2, 2)))
c = net.add_node(np.ones((2, 2)))
a[0] ^ b[1]
b[0] ^ c[1]
c[0] ^ a[1]
d = a @ b
final_result = d @ c

look like when we return a new network object at every contraction?

MichaelMarien commented 5 years ago

Of course it is more verbose, but it is clearer. Not sure how much difference this makes to lower the threshold for a new user. We can do something like:

net = TensorNetwork()
a = net.add_node(np.ones((2, 2)))
b = net.add_node(np.ones((2, 2)))
c = net.add_node(np.ones((2, 2)))
net.connect(a[0], b[1], inplace=True)
net.connect(b[0], c[1], inplace=True)
net.connect(c[0], a[1], inplace=True)
net.contract(a, b, inplace=True)
final_result = net.contract(d, c)

which is not too bad?

chaserileyroberts commented 5 years ago

Ok, so this would do what we already do. Do you think you could show an example where we use one of the newly created networks after a contraction? Maybe not with exactly my above example.

I want to think about how a user would use the API first, what advantages it would have and what difficulties we would face. (Implementation details should always come second to usability).

chaserileyroberts commented 5 years ago

So I talked this over with my team today and we think we have a reasonable solution.

We're going to add a net.copy() method. This method with return a new TensorNetwork, and two dictionaries mapping the original nodes to the copies and mapping the original edges to the copies.

The new TensorNetwork will be an isomorphic copy to the original. Infact, the actual tensor objects that will be used by the new Nodes will be the same tensor objects as the original. (We can do this with little consequences since all of the Tensor objects in our supported backends are all effectively immutable.)

The API will look something like this.

net = tensornetwork.TensorNetwork()
a = net.add_node(...)
b = net.add_node(...)
c = net.add_node(...)
e = a[0] ^ b[0]
b[1] ^ c[1]
# Here is the were the copy happens.
copied_net, node_refs, edge_refs = net.copy()
# We are free to do contractions on the old network.
a @ b

# And we can do contractions on the new network!
# To access the new network's nodes, we can use the
# node ref dictionary.
node_refs[a] @ node_refs[b]
# Same for edges.
copied_net.contract(edge_refs[e])

What's also nice is since the actual Tensor objects are shared between the original network and the copied network, we can calculate gradients more easily in the JAX, TensorFlow, and PyTorch backends. This makes implementing something like DMRG much more simple.

def dmrg_step(net, nodes, index):
  bigger_node = nodes[index] @ nodes[index+1]
  net_copy, node_refs, _ = net.copy()
  # contract the rest of the network.
  energy = tensornetwork.contractors.optimal(net).get_final_node()
  # This step will depend on the backend.
  node_refs[bigger_node].tensor -= calculate_gradient(
      energy, node_refs[bigger_node], ...)
  net_copy.split_node(node_refs[bigger_node], ...)
  # This new network is the same as the old network after taking a 
  # DMRG step.
  return net_copy

Any comments on this? We welcome constructive criticisms!

MichaelMarien commented 5 years ago

Seems like a good approach at first sight. Would need to use it in practice a bit to see how convenient it is and to what extend it covers all needs :).

Bonnevie commented 5 years ago

I guess that any functional version of the tensor network can be reduced to copying+inplace operations, and this solution certainly looks like it would be almost plug-and-play. Only thing I worry about is that the dictionary of references data structure will get a bit cumbersome in implementations?

Could you briefly sketch why it's better to instantiate a new node for each tensor when copying rather than just keeping references to the original nodes in a list or something like that?

MichaelMarien commented 5 years ago

To elaborate a bit on the point of the nodes from my experience:

As a exercise, I was moving my (very) old uniform MPS Matlab code to TensorNetwork. I feel a single MPS tensor is a Node that can be part of multiple networks (to compute transfer matrix, expectation values, during an optimisation...).

One of the strong points of the library is the fact that edges, axis... can be given names such that no mistakes can be made. However, by defining my MPS tensor as a node, every single time I add it to a network I just use the underlying tensor, defeating the point of having a robust naming convention.

A solution is to define my MPS tensor as a network with a single node and rely on the "add subnetwork" methods. However, it feels very heavy to use the TensorNetwork class for a net with only one Node. For this use case, I'd still be strongly in favour of having standalone Nodes.

maremun commented 5 years ago

I will take up the net.copy() method if no one minds.

MichaelMarien commented 5 years ago

net.copy seems useful to have in any case! Just not sure if it covers all use cases

mganahl commented 5 years ago

@Thenerdstation @MichaelMarien @maremun @Bonnevie What's the status on that? I think I'll need this for the MPS stuff, so if no-one is on it, I can do this.

chaserileyroberts commented 5 years ago

@maremun was going to take it up as my intern, but she is pretty busy with other research at the moment so go ahead and take it up.

maremun commented 5 years ago

@mganahl I did some work, but haven't tested it yet. How soon do you need this method?

mganahl commented 5 years ago

@maremun, I just finished the implementation, but need to code the tests still. it's not too urgent

mganahl commented 5 years ago

@maremun @Thenerdstation I can submit a PR and you guys can let me know if you like it or not. If not, I'll keep on using my version until yours is ready.

chaserileyroberts commented 5 years ago

@mganahl, go ahead with your PR. @maremun, feel free to let Martin take this one.

mganahl commented 5 years ago

Hey @Thenerdstation @Bonnevie @MichaelMarien @amilsted I've been thinking about this issue a bit, and I think I would find the following design very convenient.

First, we make the TensorNetwork object immutable. Actually, what I mean is that the nodes of the network are not consumed upon contraction. They just stay as they are. We are still free to add or remove nodes, connect or disconnect nodes ... . We then use a design similar to tensorflow's Graph() approach, where we use a context manager to add nodes to a network. Operations like contraction, connecting nodes, .. are then performed using @,^,.... We could have a default TensorNetwork object created at import, or the user can create one himself.

net = tn.TensorNetwork()
with net.as_default():
  a = tn.Node(np.random.rand(...))
  b = tn.Node(np.random.rand(...))
  c = tn.Node(np.random.rand(...))
  a[0] ^  b[1]
  b[2] ^ c[3]
  result = a @ b @ c #The network nodes are unaffected by this
  result_2 = a @ b @ c #same as previous line

Backends are used in the same way as before. Additionally we have functions like conj, random called like this

  conj_A = tn.conj(a)
  d = tn.random((a.shape[0], b.shape[1]))
  e = tn.ones(tn.shape(a))
  d[1] ^ c[0]
  e[0] ^ d[0]

Optimal contractions, node replacement, node removal, svd and qr could be called like this

  result = tn.contractors.optimal([a,b,c,d,e])
  tn.replace_node(c, tn.Node(np.random.rand(...))) #replace c with a new random node
  tn.remove_node(d)
  q, r = tn.qr(c, inplace=True) #replace c with two new nodes q, r

The TensorNetwork object serves as an environment where nodes and edges and their topology are stored, checks are performed, .... Things like net.removed_node() would also be available via the TensorNetwork object. It will also be super easy (as it already is) to switch backends.

My typical use cases are such that the overall topology of my network is pretty much fixed from the start and remains as it is during all computations. Computations consist mostly of contracting together with the same subsets of nodes from the tensor network over and over again until some convergence is reached. It makes then sense to have TensorNetwork being set up at the beginning, and keeping its topology fixed and not consuming its nodes during contraction. This design is relatively similar to Miles' ITensor, but it differs in the way the usere specifies connections. I would find this REALLY convenient because it would be simple, short and clear. Let me know what you think!

mganahl commented 5 years ago

To make code like this

net = tn.TensorNetwork()
with net.as_default():
  a1 = tn.Node(np.random.rand(...))
  a2 = tn.Node(np.random.rand(...))
  b1 = tn.Node(np.random.rand(...))
  b2 = tn.Node(np.random.rand(...))
  a1[0] ^ b1[0]
  a1[2] ^ a2[0]
  b1[2] ^ b2[0]
  a1[1] ^ b1[1]
  a2[1] ^ b2[1]
  result = a1 @ b1
  result = result @ a2 @ b2

result has to inherit Edges from a1, b1, a2, b2. This means that the network globablly isn't well defined anymore (multiple objects carry the same edge). We would then need to have checks in @ and all contractions operations to ensure all edges are unique. Could become complicated to maintain

chaserileyroberts commented 5 years ago

We then use a design similar to tensorflow's Graph()

Maybe I just have strong opinions, but trying to replicate what TF1 does (especially since tensorflow is moving away from that style) seems like a bad idea.

Another option is to just totally do away with the TensorNetwork object and just have the relationships between nodes be determined by the edge objects they share. This is similar to what iTensor does, but I fear this approach will cause a ton of unexpected bugs to prop up. For example:

a = tn.Node(...)
b = tn.Node(...)
c = tn.Node(...)
a[0] ^ b[0]
a[1] ^ c[0]
b[1] ^ c[1]
d = a @ b
# This is allowed since b is immutable.
e = b @ c
# What do we do in case? They share an edge!
# How can we easily check that this is an error?
f = d @ e

mganahl commented 5 years ago

What about having both? We could keep using TensorNetwork as we do now, but additionally, we give the user the ability to create Node objects that don't belong to any TensorNetwork. The interface of free Nodes is the same as for Nodes created from TensorNetwork, but the user takes more responsibility for how he uses it (i.e. some checks are only performed if Node has an associated .network).

amilsted commented 5 years ago

I was concerned about this, but I think I'm coming around to it. Operations that are not contractions of single networks (like the one @Thenerdstation illustrates) are not necessarily errors - one could just allow this sort of thing for power users.

What about having both? We could keep using TensorNetwork as we do now, but additionally, we give the user the ability to create Node objects that don't belong to any TensorNetwork.

I think this might be the way to go. It means existing code still works, but permits the more powerful (and less safe) networkless style. If we do this, we should of course make mixing the two styles an error, so that if one tries to connect a Node belonging to a network with a networkless Node, this would raise an exception.

amilsted commented 5 years ago

The node implementation seems like it could be disconnected from the network in a fairly pain-free manner (based on a cursory look). It does carry a reference to the tensor network, but it's only used for

accessing the backend. This could be supplied independently.

validating that the arguments of the @ operator are from the same network. Since contractions only make sense within the context of a tensor network, this should probably be removed. I think another issue suggested restricting it to only work inside a context manager?

it's also used indirectly within the __xor__ implementation of class Edge. Edge does not itself reference or use the tensor network outside of this.

Defining nodes and edges without an underlying network would also imply an API change that solves #188 and #174. New API would be more explicit in the underlying network.

This appeals to me quite a bit... So we blind the Nodes to both Network and Edge objects (they become a wrapper for a tensor with axis names and a backend reference) and let the network and edges determine how the nodes are connected. Each Network has a set of Edges and the Edges point to the Nodes, so that a Node can participate in multiple networks.

We use a context manager (somehow) to provide the network context for node1 @ node2, and perhaps edge1 ^ edge2 also.

Things to consider

How to handle reorder_axes() (if we keep it around)? Perhaps Edges should point to (Node, axis_name) instead of (Node, axis_number)?
Making the context manager work for node1 @ node2, which has no room for a network argument, might be a little ugly, but does not require having a global context that works outside with blocks.
Currently, node[axis_id] returns an Edge via a lookup in the node's edge list. This would also need a network for context with the suggested changes, which might be jarring...

amilsted commented 5 years ago

Here's an alternative proposal:

Introduce a NamedTensor wrapper that provides axis naming and can be supplied to net.add_node(). This way we have a backend-agnostic tensor wrapper with the convenience of named axes. A NamedTensor can participate in multiple networks (just like a regular tensor).
Make it easy to look up Nodes and Edges by tensor.

The latter could look something like this:

named_tensor = NamedTensor(raw_tensor, ["left", "middle", "right"])
my_network.add_node(named_tensor)
my_other_network.add_node(named_tensor)

my_node = my_network[named_tensor]
my_node["left"] ^ my_node["right"]

my_other_node= my_other_network[named_tensor]
my_other_node["right"] ^ my_other_node["middle"]

Lookup cold also work via "raw" tensor: my_node = my_network[raw_tensor].

Probably the cleanest thing would be to have Nodes always wrap tensors internally in a NamedTensor.

Note: One would still need to be careful about reordering axes, but I think that's more obvious with this model.

mganahl commented 5 years ago

I've started to experiment with a FreeNode class (the (experimental) code is here https://github.com/mganahl/TensorNetwork/tree/free_node). The thing I'm experimenting with is:

Use a FreeNode object with no network.
Have a global_backend set at import which is used by all methods operating on FreeNode
One uses the familiar API with ^ and @ to connect and contract FreeNode objects.
Once a FreeNode has participated in a contraction, it's uncontracted edges are passed on to the newly created FreeNode (same as with TensorNetwork.contract_between), and it obtains a fresh set of dangling edges, thus effectively wiping it clean of any connections. Things like reorder_edges still work in the expected way. What we gain is that we can essentially get rid of TensorNetwork this way, thus removing one layer of complexity between the underlying tensors and the contraction API. @Bonnevie @Thenerdstation @MichaelMarien @amilsted any input on this would be greatly appreciated.

chaserileyroberts commented 5 years ago

Our tn.copy method supports sub graph copying, so this should largely mitigate this issue I believe. @Bonnevie does this work for most of your use cases?

chaserileyroberts commented 4 years ago

tn.replicate_nodes and tn.copy should be good enough for this.

google / TensorNetwork

Reusing tensornetworks #191

Things to consider