Open mosheduminer opened 9 months ago
@mosheduminer I was working on a feature like that myself! My goal is to implement graph convolution.
My idea is to use the PyTorch sparse tensor implementation as a new backend with the specific implementation.
@antimora I was thinking about this issue, and I have a specific use case that is that it is common when working with sparse tensors: to make many operations with dense tensors (e.g. matrix multiplications)
So having a separate backend was impractical in those cases. I've come up with the idea to somehow make the Pytorch backend a trait and then implement both the sparse and no sparse (with specific implementations for the sparse x no sparse case).
Do you have any ideas/feelings on that?
If you weren't able to operate on dense on sparse tensors together, would that mean that a modules would need to use the sparse tensor implementation for weights and such? That seems like it might cause problems, no?
BTW when I opened this PR, I assumed that the sparse tensor would somehow be a layer abstracting over a dense tensor (which would only hold the "filled" spots of the sparse tensor).
I'm not sure if the best approach for sparse tensors is to introduce a new kind of tensor in the backend. We now have IntTensor
, FloatTensor
and BoolTensor
, but we could have SparseFloatTensor
as well. This way we could have a module function that accepts a mix of sparse tensors and dense tensors. Of course we should make possible the conversion between sparse and dense tensor easily (Into
, From
). The main problem is that it creates a lot more work to be done by backends.
Another option is to have a backend associative type that points to a backend with sparse tensors. This is the strategy we are using today with the FullPrecisionBackend
when you want to be sure the precision is fine to do sentible operations.
pub struct LibTorch<Float, const SPARSE: bool = false>;
impl<Float, const SPARSE: bool> Backend for LibTorch<Float, SPARSE> {
type SparseBackend = LibTorch<Float, true>;
}
Something like that!
I've been doing a bunch of poking and prodding at this feature for the last couple of weeks. There are three big approaches that I have looked into, but they all seem to have big limitations as far as I can tell.
SparseTensorOps
trait. Even worse, if we specify the format at the tensor kind instead of backend, now every backend has to implement all required representations. I can imagine this would get verbose quite quickly.FullPrecision
bridge, the amount of support for sparse tensors would be up to each backend, and they would have the highest amount of freedom. There are some troubling issues I ran into when I started actually trying to implement a backend bridge though:
FloatTensorOps
and the other traits like it lack functions unique to a sparse matrix. For example Sparse-Dense-Matrix-Multiplication is not a function in FloatTensorOps
for good reason, but would be needed for the sparse backend we are bridging to. SparseBackend
instead of Backend
might be achievable with a lot of work, but at that we would be better off just add a new tensor kind.After having run into a bunch of walls with the backend bridges, I think my next approach will be to see if I can take the best of both of the first two approaches and combine them together. I suspect it may be possible to have the SparseXTensorOps
and associated primitives with sensible default implementations that abstract over the corresponding FloatTensor
, IntTensor
and BoolTensor
, but which can be overridden by backends such as libtorch which have their own sparse tensor implementations. Of course, only time and the type system will tell me if this is achievable...
I think a mix of 1 and 2 is the best approach, providing default implementations using the existing ops, but making it possible for backends to override those new sparse ops to improve performance.
It has been a very great challenge trying to figure out how to make this work, but I've finally landed on something that works. I threw together a quick mock version in order to verify it, and it all seems sound, so I plan to start actually implementing this for the real back-ends soon.
Here is my small little mockup (names of things are very not final). Feedback or thoughts are welcome:
// ===== The trait definitions for sparse backend stuff + representations
trait ReformatSparseTensor<FROM: SparseRepr, TO: SparseRepr> {
fn reformat(from: FROM) -> TO;
}
trait SparseTensorOps<B: SparseBackend, REPR: SparseRepr> {
fn spdmm<const D: usize>(
s: B::SparseTensorPrimitive<REPR, D>,
d: <B::DenseBridge as Backend>::TensorPrimitive<D>,
);
}
trait SparseBackend:
SparseTensorOps<Self, Self::CSRRepr>
+ SparseTensorOps<Self, Self::COORepr>
+ Sized
+ 'static
+ ReformatSparseTensor<Self::COORepr, Self::CSRRepr>
+ ReformatSparseTensor<Self::CSRRepr, Self::COORepr>
{
// Bridge back to the dense backend, so we can use its functionality
type DenseBridge: Backend;
// The different representations of our sparse tensors
// SparseRepr is just a supertrait to guarantee some things less verbosely
type CSRRepr: SparseRepr;
type COORepr: SparseRepr;
// The sparse tensor primitive type, generic over the Repr
type SparseTensorPrimitive<REPR: SparseRepr, const D: usize>: Clone
+ Send
+ 'static
+ core::fmt::Debug;
}
trait Backend: Sized + 'static {
type SparseBridge: SparseBackend<DenseBridge = Self>;
type TensorPrimitive<const D: usize>: Clone + Send + 'static + core::fmt::Debug;
// All the typical backend stuff here...
}
trait SparseRepr: core::fmt::Debug + Clone + Send + Sync + 'static {}
// ==== The *default* "decorator" sparse backend that abstracts over Bs tensor primitives
#[derive(Debug, Clone)]
struct DefaultCOORepr {
// This would ideally utilize the backends tensor primitives, but that's too much for the mockup
contents: u32,
}
#[derive(Debug, Clone)]
struct DefaultCSRRepr {
// This would ideally utilize the backends tensor primitives, but that's too much for the mockup
contents: i32,
}
impl SparseRepr for DefaultCOORepr {}
impl SparseRepr for DefaultCSRRepr {}
#[derive(Debug, Clone)]
struct DefaultSparseTensorPrimitive<REPR: SparseRepr, const D: usize> {
contents: REPR,
}
struct SparseDecorator<B: Backend> {
b: core::marker::PhantomData<B>,
}
impl<B: Backend> ReformatSparseTensor<DefaultCOORepr, DefaultCSRRepr> for SparseDecorator<B> {
fn reformat(from: DefaultCOORepr) -> DefaultCSRRepr {
DefaultCSRRepr {
contents: from.contents as i32,
}
}
}
impl<B: Backend> ReformatSparseTensor<DefaultCSRRepr, DefaultCOORepr> for SparseDecorator<B> {
fn reformat(from: DefaultCSRRepr) -> DefaultCOORepr {
DefaultCOORepr {
contents: from.contents as u32,
}
}
}
impl<B: Backend<SparseBridge = SparseDecorator<B>>, REPR: SparseRepr>
SparseTensorOps<B::SparseBridge, REPR> for SparseDecorator<B>
{
fn spdmm<const D: usize>(
s: <B::SparseBridge as SparseBackend>::SparseTensorPrimitive<REPR, D>,
d: <<Self as SparseBackend>::DenseBridge as Backend>::TensorPrimitive<D>,
) {
println!("{:?} x {:?}", s, d);
}
}
impl<B: Backend<SparseBridge = SparseDecorator<B>>> SparseBackend for SparseDecorator<B> {
type DenseBridge = B;
type CSRRepr = DefaultCSRRepr;
type COORepr = DefaultCOORepr;
type SparseTensorPrimitive<REPR: SparseRepr, const D: usize> =
DefaultSparseTensorPrimitive<REPR, D>;
}
// ===== And testing it out:
struct TestBackend;
impl Backend for TestBackend {
type SparseBridge = SparseDecorator<TestBackend>;
type TensorPrimitive<const D: usize> = String;
}
fn main() {
let dense_primitive: <TestBackend as Backend>::TensorPrimitive<1> =
"Im a dense matrix!".to_string();
let sparse_primitive: <<TestBackend as Backend>::SparseBridge as SparseBackend>::SparseTensorPrimitive<<<TestBackend as Backend>::SparseBridge as SparseBackend>::CSRRepr, 1> = DefaultSparseTensorPrimitive { contents: DefaultCSRRepr { contents: 12 } };
dbg!(&dense_primitive);
dbg!(&sparse_primitive);
<<TestBackend as Backend>::SparseBridge as SparseTensorOps<
<TestBackend as Backend>::SparseBridge,
DefaultCSRRepr,
>>::spdmm(sparse_primitive, dense_primitive);
}
I tried really hard to make the backend supertrait require implementation of SparseBackend, or something like it, but it ended up being impossible due to type system constraints (without becoming very verbose on the generics). The way I'm doing it now seems plenty extensible (any backend can define a custom SparseBridge, but we also have a handy decorator for backends that can implement sparse tensors without assuming anything about backend details.
The only thing I'm still a little concerned about is having the SparseBackend trait explicitly define the COO, CSR, etc associated types. There isn't an easy way to have a backend which has a custom representation other than misusing one of the existing associated types, or for a backend to only implement a subset of them (without having one of the ATs be meaningless).
@McArthur-Alford that's interesting. I think it would be better to define a single representation agnostic sparse backend, and use the backend bridge to change between the two. The traits work almost the same way you presented, but without any forced representation.
@nathanielsimard I would like for it to work this way, but I'm struggling to figure out how to do it satisfyingly. The reason I have multiple representations enforced in the trait right now is there are meaningful differences between the costs/benefits of each representation. Having the backend specify its own representation sounds great, but ideally a backend allows multiple representations so users can choose the best one for the operation. I cant think of how to let each backend specify its own set of allowed representations whilst also providing easy functionality in the API to swap representations.
edit: I haven't really thought about it enough, but it is possible to extend the tensor API on a per-backend basis. Ill see if it isn't possible to do something like the below:
Tensor<SpecificBackend, D>
which convert the internal representation but don't change anything about the tensors type. Even better, a trait with to_reprSparseDecorator<B: Backend>
nicely.@McArthur-Alford I wouldn't force backends to actually implement the sparse backend trait; I would consider it to be an optional backend extension that should be opt-in. The same backend could still implement multiple representations by having a generic argument for that representation:
impl<F: FloatElement> SparseBackend for NdArray<F, SparseCsr> {
...
}
impl<F: FloatElement> SparseBackend for NdArray<F, SparseCoo> {
...
}
In the modeling code:
type MyBackend = NdArray<f32, SparseCsr>;
use burn::tensor::backend::SparseBackend as Backend;
fn inference<B: Backend>(input: Tensor<B, 2>) {
let input: SparseTensor<B, 2> = input.into_sparse(); // Could be that
let input: Tensor<B, 2, SparseFloat> = input.into_sparse(); // Or that
...
}
In addition, using backend bridges, one should be able to switch between backends, therefore representation, at runtime!
Ok, so I'm making pretty decent progress on this. For the most part it doesn't require any substantial changes to the API because of the approach taken, though there is one change I figured would be convenient for reducing duplicated code (though potentially quite a significant change).
Right now the tensor API has BasicOps<B>
. Sparse tensors can only implement some of these basic ops, so for the moment I've broken it up into a BasicOps
(shared by all), BasicDenseOps
(unique to dense) and BasicSparseOps
(unique to sparse) just so that we have some more control there. I could realistically do it with a lot less changes by just having BasicOps
and BasicSparseOps
, but that would lead to a fair few functions in BasicOps
being repeated in BasicSparseOps
. I got everything compiling after splitting them up, but there is still quite a bit of reorganizing for me to do as I add sparse ops, so I figured Id check in on what is preferred.
Thoughts?
@McArthur-Alford I think it makes sense to create BasicOps
and BasicDenseOps
, however it's going to be a bit weird with the other traits not having Dense
or Sparse
implying that they are used by all! If we can refine a bit the names that would be a better, but I get the separation.
Feature description
A tensor representation that only stores data for positions where the value non-zero. Operations on sparse tensors could then be optimized based on the fact that most of the tensor is empty and only a few points need to be computed.
Feature motivation
Sometimes I (and others) deal with cases where tensors are always largely empty (often less than 1% full). In such cases, it would much more efficient (memory-wise and computationally) to only store data for indexes that are used. This is especially important for scenarios when the tensor would be very large if it were represented in a dense representation.
(Optional) Suggest a Solution
I don't have a concrete idea how this might be implemented, I imagine existing libraries (such as
sprs
in rust, or in other ecosystems) can serve as inspiration.