Memory Pool - Githubissues

kaushikcfd commented 5 years ago

This PR implements a memory pool for ViennaCL's OpenCL backend.

Brief overview of the implementation details:

The code for defining the memory pool is mostly pulled from PyOpenCL's memory pool implementation with appropriate licences.
Classes that allocate a memory like vector_base, now take in an additional template parameter that determines the handle type of the memory allocated. The handle type can be either viennacl::ocl::mem_handle<cl_mem> or viennacl::ocl::pooled_clmem_handle . This breaks backward compatibility, but I expect there wouldn't be many instances in which a user-facing code would deal with memory handles.
The few changes that are needed to be propagated across PETSc can be seen in this commit.
The temporaries involved in linalg/vector_operations.hpp are now allocated through a pooled handle.

These allocation calls for the temporaries were substantial when vector operations were called in PETSc. The following table shows the timings(in ms) before and after implementation of the pooled memory handle.

	Before	After
`VecNorm`	0.294	0.036
`VecMDot`	0.741	0.506

Details of the test:

VecNorm on vector of length (2^{18}),
VecMDot on a vectors of length (2^{18}) involving 30 inner products.
Tests run on a machine with Nvidia Titan V.

I have attached the files for the tests, along with their makefiles.

Attachments: timings.tar.gz

kaushikcfd commented 5 years ago

cc @inducer

karlrupp commented 5 years ago

Thanks, @kaushikcfd !

While a memory pool is one way to solve the issue you have encountered with PETSc, I'm afraid this PR is way too intrusive (yet backend-specific) for achieving the actual goal of eliminating the impact of temporaries you've seen in PETSc.

Please let me spend some more thoughts on a more concise (yet equally powerful) fix that immediately carries over to the CUDA backend as well.

kaushikcfd commented 5 years ago

@karlrupp: Thanks for taking a look. This can be extended for CUDA backend as well. PyCUDA and PyOpenCL share the same memory pool implementation and hence we would also need to just add another memory pool allocator class, which would involve minimal changes.

viennacl / viennacl-dev

Memory Pool #270