symmetric backend very slow for PEPS tensors

Hey dev team. I have been working on a PEPS algorithms for simulating 2D fermionic systems with the symmetric backend and it turned out the symmetric backend is very slow for PEPS tensors. I did some profiling and found that the bottleneck is the slow performance of the functions in the blocksparse_utils.py file which mainly contributes to the _find_diagonal_sparse_blocks() and _find_transposed_diagonal_sparse_blocks() functions.

I was wondering if there is any solution to speed up the symmetric backend or any advice or suggestion from your side?

Hi @saeedjahromi, thank you for the message! Great to hear that you are using the library for PEPS.

You are saying that the code is very slow. What is the baseline you are comparing it to (plain numpy, other symmetric tensor code)? Also, are you doing variational optimization, CTM, or something else? How large are your bond dimensions?

The functions you are mentioning are computing the symmetry blocks of a given tensor (when reshaped into a matrix). This step can indeed become a bottleneck if your bond dimensions are small and your tensor have only a few (~<=3) legs.

To give some context: Our code uses an approach that is somewhat different from many other publicly available packages. We don't store the individual symmetry blocks of a tensors separately. Instead, all non-zero elements are stored in a 1d data-array (numpy array). If we want to perform contractions, decompositions a.s.o. we use the charge information of each tensor leg to work out which elements of this data-array go into which symmetry block. Working out this mapping can become the bottleneck for small bond dimensions. That said, this approach can have significant advantages if the tensors have higher ranks (~>=4) and/or more than one simultaneous symmetry (e.g. two or more species for fermions, charge conservation + Z2 conservation a.s.o.).

One thing you can do to increase speed is to turn on a "caching" option within the block-sparse code. You can do this with tensornetwork.block_sparse.enable_caching(). If this is turned on, the functions you mentioned above are cached on the inputs. This works very well if there is no truncation step in your algorithm, e.g. via an SVD. If there is an SVD you can still use caching, but the chances of getting a cache hit are significantly reduced. This is because SVDs/truncations often involve more than one tensor in your network, and lead to redistribution of charges across the involved tensors. This makes caching less efficient. Furthermore, due to the constant charge redistribution, the cache may fill up and use a lot of memory. If this happens you can clear the cache using tensornetwork.block_sparse.clear_cache(). You can also just disable caching for the SVD with tensornetwork.block_sparse.disable_caching().

Let me know if this helps!

Hi @mganahl Thanks for your response and helpful remarks. I have designed a fermionic (symmetric) iPEPS code which currently uses simple update based on local SVD for updating the PEPS tensors and later on a CTMRG algorithm for approximating the contraction of the whole network and variational calculation of the expectation values.

The bottleneck in the simple update is that since it is an iterative process in which the local PEPS tensors, the hamiltonian gate, and the surrounding lambdas are joined, and split by local SVD. In this process, the order of quantum numbers will change and to have a consistent update and good convergence, one has to call the contiguous() method to reorder the charges. This function then needs to call _find_diagonal_sparse_blocks() and _find_transposed_diagonal_sparse_blocks() functions.

On top of that in CTMRG, one needs to build the reduced tensors by joint a PEPS tensor and its conjugate along the physical leg and then join the corresponding virtual legs to perform a double-layer CTM. Apparently, the process of building reduced tensors is also a bottleneck for the CTMRG since the code again has to call _find_diagonal_sparse_blocks() and _find_transposed_diagonal_sparse_blocks() functions and find the correct charges for the legs that are joined.

I tried to run the code for different bond dimensions for up to D=8. It seems while ncon() can be faster (compared to numpy backend) for the contraction of single layer tensors, it is much slower for contracting tensors that their individual legs have been constructed from joining other legs, e.x., reduced tensors of CTMRG.

I have another implementation of fermionic iPEPS based on https://github.com/mhauru/abeliantensors which has a block-wise implementation of symmetric tensors and your symmetric backend is slower than this implementation for joining and splitting of tensor legs. However I liked your code much better especially given the fact that it allows mixed symmetries such as U1xZ2, etc,...

Hi Saeed, thanks for the explanations. Would it be possible for you to prepare a small script which runs benchmarks for the cases you were mentioning? You can include mhaurus code as well, I am familiar with it. This would help us pin down the issue and find a solution.

Thanks!

Saeed S. Jahromi @.***> schrieb am Mi. 6. Okt. 2021 um 10:42:

Hi @mganahl https://github.com/mganahl Thanks for your response and helpful remarks. I have designed a fermionic (symmetric) iPEPS code which currently uses simple update based on local SVD for updating the PEPS tensors and later on a CTMRG algorithm for approximating the contraction of the whole network and variational calculation of the expectation values.

The bottleneck in the simple update is that since it is an iterative process in which the local PEPS tensors, the hamiltonian gate, and the surrounding lambdas are joined, and split by local SVD. In this process, the order of quantum numbers will change and to have a consistent update and good convergence, one has to call the contiguous() method to reorder the charges. This function then needs to call _find_diagonal_sparse_blocks() and _find_transposed_diagonal_sparse_blocks() functions.

On top of that in CTMRG, one needs to build the reduced tensors by joint a PEPS tensor and its conjugate along the physical leg and then join the corresponding legs to perform a double-layer CTM. Apparently, the process of building reduced tensors is also a bottleneck for the CTMRG since the code again has to call _find_diagonal_sparse_blocks() and _find_transposed_diagonal_sparse_blocks() functions and find the correct charges for the legs that are joined.

I tried to run the code for different bond dimensions for up to D=8. It seems while ncon() can be faster (compared to numpy backend) for the contraction of single layer tensors, it is much slower for contracting tensors that their individual legs have been constructed from joining other legs, e.x., reduced tensors of CTMRG.

I have another implementation of fermionic iPEPS based on https://github.com/mhauru/abeliantensors http://url which has a block-wise implementation of symmetric tensors and your symmetric backend is slower than this implementation for joining and splitting of tensor legs. However I liked your code much better especially given the fact that it allows mixed symmetries such as U1xZ2, etc,...

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/google/TensorNetwork/issues/947#issuecomment-935761887, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7RWE2WCLF5BWBGN3PHRGDUFQDXVANCNFSM5FHQXZJQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Sure I will prepare a script to benchmark both codes against each other say for CTMRG algorithm.

google / TensorNetwork

symmetric backend very slow for PEPS tensors #947