Implemented batched tensor contraction algorithm for core tensors.

tyzhang1993 commented 7 years ago

Description

This PR implements a batching algorithm to reduce the memory footprint of tensor contraction intermediates.

Contraction C["ijrs"] += 0.5 * B["gar"] * B["gbs"] * T["ijab"] will generate an A["abrs"] intermediate tensor, which can be too large to hold in memory. By introducing batched syntax, C["ijrs"] += batched("s", 0.5 * B["gar"] * B["gbs"] * T["ijab"]), which will perform the contraction by batching the index s, equivalent to:

Loop over s:
    C["ijr"] += 0.5 * B["gar"] * B["gb"] * T["ijab"]

where only an A["abr"] small intermediate tensor need to be generated. This syntax may also loop over multiple batching indices, for example C["ijrs"] += batched("rs", 0.5 * B["gar"] * B["gbs"] * T["ijab"]) will only need an intermediate tensor of size A["ab"].

Todos

Define classes and functions
- [x] class LabeledTensorBatchedContraction
- [x] class LabeledBlockedTensorBatchedProduct
- [x] LabeledTensorBatchedContraction batched(const string &batched_indices, const LabeledTensorContraction &contraction)
- [x] LabeledBlockedTensorBatchedProduct batched(const string &batched_indices, const LabeledBlockedTensorProduct &product)
Implement batched contraction algorithm for tensor
- [x] LabeledTensor::contract_batched
- [x] LabeledTensor::operator=/+=/-= LabeledTensorBatchedContraction
Implement batched contraction algorithm for blocked_tensor
- [x] LabeledBlockedTensor::contract_batched
- [x] LabeledBlockedTensor::operator=/+=/-= LabeledBlockedTensorBatchedProduct
Add test cases
- [x] test_operators
- [x] test_block
  Status
[x] Ready to go

fevangelista commented 7 years ago

Thanks Sam. This is a neat addition to Ambit. I have not had a chance to review the code but notices that some of the tests fail because of issues with pyenv on TravisCI. @jturney do you have a patch for that?

lcyyork commented 7 years ago

Would it make sense to parallel the batching loop?

tyzhang1993 commented 7 years ago

@lcyyork Good comment, I actually thought about it. There are two points to this comment: 1. Ambit is not intended to handle parallelization for core tensor, which may be handled by lapack/blas libraries or the code calling ambit. 2. Parallelization requires more memory, which conflicts with the current goal of reducing memory footprint.

fevangelista commented 7 years ago

I think we should go ahead and start with introducing this functionality and making sure it is well tested. Then we can certainly talk about optimization. For example, loops could be batched over a range of s values.

jturney commented 7 years ago

This is great work. Thanks for adding it.

Let me see if I can get the Travis Python errors figured out.

loriab commented 7 years ago

Looks like you're getting gcc from both precise and brew? The cmake and python you can get from conda if you wanted to switch away from pyenv. Or you can get gcc 4.8.5 and 5.2 for mac from conda if that'd do.

tyzhang1993 commented 7 years ago

This PR should now be ready to go. Travis CI python error has been fixed. @jturney

jturney commented 7 years ago

Great work! Thanks for it!

jturney / ambit

Implemented batched tensor contraction algorithm for core tensors. #23

Description

Todos

Status