Closed Bonnevie closed 4 years ago
I'm open to suggestions on improving the library! I really want to make it more functional, but I don't know of any "clean" way of building a graph in a functional way.
Modifying the graph object in place is definitely one of of my biggest gripes with it too (And I wrote it!), But again, I'm not sure what approach would be better.
I think I can explain the confusing difference.
So in Numpy/Tensorflow, the base objects (np.array, tf.Tensor) at immutable objects, where as base objects (Node, Edge, TensorNetwork) are fully mutable, and can change state depending upon actions taken by neighbor objects.
But again, I'm not sure there is a better way since TensorNetworks are inherently mutable structures. If you can think of a design that allows for the construction/contraction of a TN using immutable objects, please let me know!
I've been mulling it over since posing the question. Bear in mind that I'm not fully aware of the current implementation outside of the user-facing parts.
Contractions currently seem to replace the two nodes being contracted with a new node containing the contracted super-node, so the original nodes are disconnected from the network and the super-node is added in in their place. If we consider contractions as operations on tensor network objects, wouldn't it make sense for them to return a new tensor network instance with the modified node/edge list while preserving the original instance and its structure?
Edge and node objects should ideally be shared across the tensor networks, which requires that their specification does not require references to the network they are part of. The most tricky thing here seems to be that the edges connecting the contracted super-node to the rest of the network are shared with the original pre-contraction network, but the edges connect to different nodes. Alternatively, the edges could be part of the network specification and thus not be shared while the nodes are maintained.
The contractions could in theory be made reversible by letting the node introduced through contraction be a member of a new SubNetwork/SuperNode class that works like a node and has a tensor specification, but maintains a reference to a tensor network model that decomposes the tensor (e.g. the network of the two contracted nodes for a regular edge contraction).
I do agree that letting contractions return a new tensor network instance makes sense. Specifying nodes and edges without references to the network might require big changes to the API, cannot really just without looking deeper.
The node implementation seems like it could be disconnected from the network in a fairly pain-free manner (based on a cursory look). It does carry a reference to the tensor network, but it's only used for
__xor__
implementation of class Edge
. Edge
does not itself reference or use the tensor network outside of this. One use case for a redesign like this could be an implementation of the multienv
algorithm (https://arxiv.org/abs/1310.8023) which allows the computation of the environment tensors of all tensors at small cost, but which I think would be difficult to implement at the moment since you need to reuse partial contractions (although it's probably possible with enough bookkeeping).
Defining nodes and edges without an underlying network would also imply an API change that solves #188 and #174. New API would be more explicit in the underlying network.
This think about this for the user perspective. How would doing something simple like
net = TensorNetwork()
a = net.add_node(np.ones((2, 2)))
b = net.add_node(np.ones((2, 2)))
c = net.add_node(np.ones((2, 2)))
a[0] ^ b[1]
b[0] ^ c[1]
c[0] ^ a[1]
d = a @ b
final_result = d @ c
look like when we return a new network object at every contraction?
Of course it is more verbose, but it is clearer. Not sure how much difference this makes to lower the threshold for a new user. We can do something like:
net = TensorNetwork()
a = net.add_node(np.ones((2, 2)))
b = net.add_node(np.ones((2, 2)))
c = net.add_node(np.ones((2, 2)))
net.connect(a[0], b[1], inplace=True)
net.connect(b[0], c[1], inplace=True)
net.connect(c[0], a[1], inplace=True)
net.contract(a, b, inplace=True)
final_result = net.contract(d, c)
which is not too bad?
Ok, so this would do what we already do. Do you think you could show an example where we use one of the newly created networks after a contraction? Maybe not with exactly my above example.
I want to think about how a user would use the API first, what advantages it would have and what difficulties we would face. (Implementation details should always come second to usability).
So I talked this over with my team today and we think we have a reasonable solution.
We're going to add a net.copy()
method. This method with return a new TensorNetwork
, and two dictionaries mapping the original nodes to the copies and mapping the original edges to the copies.
The new TensorNetwork
will be an isomorphic copy to the original. Infact, the actual tensor objects that will be used by the new Node
s will be the same tensor objects as the original. (We can do this with little consequences since all of the Tensor objects in our supported backends are all effectively immutable.)
The API will look something like this.
net = tensornetwork.TensorNetwork()
a = net.add_node(...)
b = net.add_node(...)
c = net.add_node(...)
e = a[0] ^ b[0]
b[1] ^ c[1]
# Here is the were the copy happens.
copied_net, node_refs, edge_refs = net.copy()
# We are free to do contractions on the old network.
a @ b
# And we can do contractions on the new network!
# To access the new network's nodes, we can use the
# node ref dictionary.
node_refs[a] @ node_refs[b]
# Same for edges.
copied_net.contract(edge_refs[e])
What's also nice is since the actual Tensor objects are shared between the original network and the copied network, we can calculate gradients more easily in the JAX, TensorFlow, and PyTorch backends. This makes implementing something like DMRG much more simple.
def dmrg_step(net, nodes, index):
bigger_node = nodes[index] @ nodes[index+1]
net_copy, node_refs, _ = net.copy()
# contract the rest of the network.
energy = tensornetwork.contractors.optimal(net).get_final_node()
# This step will depend on the backend.
node_refs[bigger_node].tensor -= calculate_gradient(
energy, node_refs[bigger_node], ...)
net_copy.split_node(node_refs[bigger_node], ...)
# This new network is the same as the old network after taking a
# DMRG step.
return net_copy
Any comments on this? We welcome constructive criticisms!
Seems like a good approach at first sight. Would need to use it in practice a bit to see how convenient it is and to what extend it covers all needs :).
I guess that any functional version of the tensor network can be reduced to copying+inplace operations, and this solution certainly looks like it would be almost plug-and-play. Only thing I worry about is that the dictionary of references data structure will get a bit cumbersome in implementations?
Could you briefly sketch why it's better to instantiate a new node for each tensor when copying rather than just keeping references to the original nodes in a list or something like that?
To elaborate a bit on the point of the nodes from my experience:
As a exercise, I was moving my (very) old uniform MPS Matlab code to TensorNetwork. I feel a single MPS tensor is a Node that can be part of multiple networks (to compute transfer matrix, expectation values, during an optimisation...).
One of the strong points of the library is the fact that edges, axis... can be given names such that no mistakes can be made. However, by defining my MPS tensor as a node, every single time I add it to a network I just use the underlying tensor, defeating the point of having a robust naming convention.
A solution is to define my MPS tensor as a network with a single node and rely on the "add subnetwork" methods. However, it feels very heavy to use the TensorNetwork class for a net with only one Node. For this use case, I'd still be strongly in favour of having standalone Nodes.
I will take up the net.copy()
method if no one minds.
net.copy seems useful to have in any case! Just not sure if it covers all use cases
@Thenerdstation @MichaelMarien @maremun @Bonnevie What's the status on that? I think I'll need this for the MPS stuff, so if no-one is on it, I can do this.
@maremun was going to take it up as my intern, but she is pretty busy with other research at the moment so go ahead and take it up.
@mganahl I did some work, but haven't tested it yet. How soon do you need this method?
@maremun, I just finished the implementation, but need to code the tests still. it's not too urgent
@maremun @Thenerdstation I can submit a PR and you guys can let me know if you like it or not. If not, I'll keep on using my version until yours is ready.
@mganahl, go ahead with your PR. @maremun, feel free to let Martin take this one.
Hey @Thenerdstation @Bonnevie @MichaelMarien @amilsted I've been thinking about this issue a bit, and I think I would find the following design very convenient.
First, we make the TensorNetwork
object immutable.
Actually, what I mean is that the nodes of the network are not consumed upon contraction. They just stay as they are. We are still free to add or remove nodes, connect or disconnect nodes ... . We then use a design similar to tensorflow's Graph() approach, where we use a context manager to add nodes to a network.
Operations like contraction, connecting nodes, .. are then performed using @,^,...
. We could have a default TensorNetwork
object created at import, or the user can create one himself.
net = tn.TensorNetwork()
with net.as_default():
a = tn.Node(np.random.rand(...))
b = tn.Node(np.random.rand(...))
c = tn.Node(np.random.rand(...))
a[0] ^ b[1]
b[2] ^ c[3]
result = a @ b @ c #The network nodes are unaffected by this
result_2 = a @ b @ c #same as previous line
Backends are used in the same way as before. Additionally we have functions like conj
, random
called like this
conj_A = tn.conj(a)
d = tn.random((a.shape[0], b.shape[1]))
e = tn.ones(tn.shape(a))
d[1] ^ c[0]
e[0] ^ d[0]
Optimal contractions, node replacement, node removal, svd and qr could be called like this
result = tn.contractors.optimal([a,b,c,d,e])
tn.replace_node(c, tn.Node(np.random.rand(...))) #replace c with a new random node
tn.remove_node(d)
q, r = tn.qr(c, inplace=True) #replace c with two new nodes q, r
The TensorNetwork
object serves as an environment where nodes and edges and their topology are stored, checks are performed, .... Things like net.removed_node()
would also be available via the TensorNetwork
object. It will also be super easy (as it already is) to switch backends.
My typical use cases are such that the overall topology of my network is pretty much fixed from the start and remains as it is during all computations. Computations consist mostly of contracting together with the same subsets of nodes from the tensor network over and over again until some convergence is reached. It makes then sense to have TensorNetwork
being set up at the beginning, and keeping its topology fixed and not consuming its nodes during contraction. This design is relatively similar to Miles' ITensor, but it differs in the way the usere specifies connections. I would find this REALLY convenient because it would be simple, short and clear. Let me know what you think!
To make code like this
net = tn.TensorNetwork()
with net.as_default():
a1 = tn.Node(np.random.rand(...))
a2 = tn.Node(np.random.rand(...))
b1 = tn.Node(np.random.rand(...))
b2 = tn.Node(np.random.rand(...))
a1[0] ^ b1[0]
a1[2] ^ a2[0]
b1[2] ^ b2[0]
a1[1] ^ b1[1]
a2[1] ^ b2[1]
result = a1 @ b1
result = result @ a2 @ b2
result
has to inherit Edge
s from a1, b1, a2, b2
. This means that the network globablly isn't
well defined anymore (multiple objects carry the same edge). We would then need to have checks in @
and all contractions operations to ensure all edges are unique. Could become complicated to maintain
We then use a design similar to tensorflow's Graph()
Maybe I just have strong opinions, but trying to replicate what TF1 does (especially since tensorflow is moving away from that style) seems like a bad idea.
Another option is to just totally do away with the TensorNetwork
object and just have the relationships between nodes be determined by the edge objects they share. This is similar to what iTensor does, but I fear this approach will cause a ton of unexpected bugs to prop up. For example:
a = tn.Node(...)
b = tn.Node(...)
c = tn.Node(...)
a[0] ^ b[0]
a[1] ^ c[0]
b[1] ^ c[1]
d = a @ b
# This is allowed since b is immutable.
e = b @ c
# What do we do in case? They share an edge!
# How can we easily check that this is an error?
f = d @ e
What about having both? We could keep using TensorNetwork
as we do now, but additionally, we give the user the ability to create Node
objects that don't belong to any TensorNetwork
.
The interface of free Node
s is the same as for Nodes
created from TensorNetwork
, but
the user takes more responsibility for how he uses it (i.e. some checks are only performed if Node
has an associated .network
).
I was concerned about this, but I think I'm coming around to it. Operations that are not contractions of single networks (like the one @Thenerdstation illustrates) are not necessarily errors - one could just allow this sort of thing for power users.
What about having both? We could keep using TensorNetwork as we do now, but additionally, we give the user the ability to create Node objects that don't belong to any TensorNetwork.
I think this might be the way to go. It means existing code still works, but permits the more powerful (and less safe) networkless style. If we do this, we should of course make mixing the two styles an error, so that if one tries to connect a Node
belonging to a network with a networkless Node
, this would raise an exception.
The node implementation seems like it could be disconnected from the network in a fairly pain-free manner (based on a cursory look). It does carry a reference to the tensor network, but it's only used for
- accessing the backend. This could be supplied independently.
- validating that the arguments of the @ operator are from the same network. Since contractions only make sense within the context of a tensor network, this should probably be removed. I think another issue suggested restricting it to only work inside a context manager?
- it's also used indirectly within the
__xor__
implementation of classEdge
.Edge
does not itself reference or use the tensor network outside of this.Defining nodes and edges without an underlying network would also imply an API change that solves #188 and #174. New API would be more explicit in the underlying network.
This appeals to me quite a bit... So we blind the Node
s to both Network
and Edge
objects (they become a wrapper for a tensor with axis names and a backend reference) and let the network and edges determine how the nodes are connected. Each Network
has a set of Edge
s and the Edge
s point to the Node
s, so that a Node
can participate in multiple networks.
We use a context manager (somehow) to provide the network context for node1 @ node2
, and perhaps edge1 ^ edge2
also.
reorder_axes()
(if we keep it around)? Perhaps Edge
s should point to (Node, axis_name)
instead of (Node, axis_number)
?node1 @ node2
, which has no room for a network argument, might be a little ugly, but does not require having a global context that works outside with
blocks.node[axis_id]
returns an Edge
via a lookup in the node's edge list. This would also need a network for context with the suggested changes, which might be jarring...Here's an alternative proposal:
NamedTensor
wrapper that provides axis naming and can be supplied to net.add_node()
. This way we have a backend-agnostic tensor wrapper with the convenience of named axes. A NamedTensor
can participate in multiple networks (just like a regular tensor). Node
s and Edge
s by tensor.The latter could look something like this:
named_tensor = NamedTensor(raw_tensor, ["left", "middle", "right"])
my_network.add_node(named_tensor)
my_other_network.add_node(named_tensor)
my_node = my_network[named_tensor]
my_node["left"] ^ my_node["right"]
my_other_node= my_other_network[named_tensor]
my_other_node["right"] ^ my_other_node["middle"]
Lookup cold also work via "raw" tensor: my_node = my_network[raw_tensor]
.
Probably the cleanest thing would be to have Node
s always wrap tensors internally in a NamedTensor
.
Note: One would still need to be careful about reordering axes, but I think that's more obvious with this model.
I've started to experiment with a FreeNode
class (the (experimental) code is here
https://github.com/mganahl/TensorNetwork/tree/free_node).
The thing I'm experimenting with is:
FreeNode
object with no network.global_backend
set at import which is used by all methods operating on FreeNode
^
and @
to connect and contract FreeNode
objects.FreeNode
has participated in a contraction, it's uncontracted edges are passed on to the newly created FreeNode
(same as with TensorNetwork.contract_between
), and it obtains a fresh set of dangling edges, thus effectively wiping it clean of any connections.
Things like reorder_edges
still work in the expected way. What we gain is that we can
essentially get rid of TensorNetwork
this way, thus removing one layer of complexity between the underlying tensors and the contraction API.
@Bonnevie @Thenerdstation @MichaelMarien @amilsted any input on this would be
greatly appreciated. Our tn.copy
method supports sub graph copying, so this should largely mitigate this issue I believe. @Bonnevie does this work for most of your use cases?
tn.replicate_nodes
and tn.copy
should be good enough for this.
As a new user, the mix of purely functional and object-oriented approaches is confusing me a bit.
Numpy and Tensorflow (the backends) are mostly functional, but running any sort of contraction in TensorNetwork modifies the network in-place and invalidates references to the nodes being contracted.
I can think of some reasons why it works this way, but keeping track of nodes and edges (i.e. keeping track of contraction trees) is made quite complicated by this, and it also adds quite a bit of overhead to computing tensor environments or multiple subnetworks. Is there any way around this? Or do I just have to live with rerunning the constructor?
Aside from this minor gripe, I am really excited for this package, thank you for the great work.