tracel-ai / burn

Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals.
https://burn.dev
Apache License 2.0
7.83k stars 374 forks source link

Support Sparse Tensors #846

Open mosheduminer opened 9 months ago

mosheduminer commented 9 months ago

Feature description

A tensor representation that only stores data for positions where the value non-zero. Operations on sparse tensors could then be optimized based on the fact that most of the tensor is empty and only a few points need to be computed.

Feature motivation

Sometimes I (and others) deal with cases where tensors are always largely empty (often less than 1% full). In such cases, it would much more efficient (memory-wise and computationally) to only store data for indexes that are used. This is especially important for scenarios when the tensor would be very large if it were represented in a dense representation.

(Optional) Suggest a Solution

I don't have a concrete idea how this might be implemented, I imagine existing libraries (such as sprs in rust, or in other ecosystems) can serve as inspiration.

gonzalesMK commented 9 months ago

@mosheduminer I was working on a feature like that myself! My goal is to implement graph convolution.

My idea is to use the PyTorch sparse tensor implementation as a new backend with the specific implementation.

gonzalesMK commented 8 months ago

@antimora I was thinking about this issue, and I have a specific use case that is that it is common when working with sparse tensors: to make many operations with dense tensors (e.g. matrix multiplications)

So having a separate backend was impractical in those cases. I've come up with the idea to somehow make the Pytorch backend a trait and then implement both the sparse and no sparse (with specific implementations for the sparse x no sparse case).

Do you have any ideas/feelings on that?

mosheduminer commented 8 months ago

If you weren't able to operate on dense on sparse tensors together, would that mean that a modules would need to use the sparse tensor implementation for weights and such? That seems like it might cause problems, no?

BTW when I opened this PR, I assumed that the sparse tensor would somehow be a layer abstracting over a dense tensor (which would only hold the "filled" spots of the sparse tensor).

nathanielsimard commented 8 months ago

I'm not sure if the best approach for sparse tensors is to introduce a new kind of tensor in the backend. We now have IntTensor, FloatTensor and BoolTensor, but we could have SparseFloatTensor as well. This way we could have a module function that accepts a mix of sparse tensors and dense tensors. Of course we should make possible the conversion between sparse and dense tensor easily (Into, From). The main problem is that it creates a lot more work to be done by backends.

Another option is to have a backend associative type that points to a backend with sparse tensors. This is the strategy we are using today with the FullPrecisionBackend when you want to be sure the precision is fine to do sentible operations.

pub struct LibTorch<Float, const SPARSE: bool = false>;

impl<Float, const SPARSE: bool> Backend for LibTorch<Float, SPARSE> {
    type SparseBackend = LibTorch<Float, true>;
}

Something like that!

McArthur-Alford commented 2 months ago

I've been doing a bunch of poking and prodding at this feature for the last couple of weeks. There are three big approaches that I have looked into, but they all seem to have big limitations as far as I can tell.

  1. A new sparse tensor type. Alternatively, a backend decorator that utilizes the underlying backend's tensors to do a more high level sparse matrix representation. For example a sparse tensor in CSR form can be represented with three actual tensors. This seems like the most easy approach, realistically it could even be done as a new crate that extends burn. That said, it kind of goes against the design philosophy of burn by taking the freedom away from backends to decide how sparse tensors are supported (this might even limit performance?).
  2. A new tensor kind. This is the obvious one, and from what I can tell also straightforward to represent. There are however, some challenges:
    • Every backend would need to support the new tensor kind(s) immediately. This seems like a painfully large change. Throw on top of that all the extra functions for existing tensors to convert to/from and it becomes a lot of work.
    • The backends are still lacking freedom. It seems from my exploration of the codebase that tensors have one representation in burn (for a single backend), but sparse tensors can be represented in many ways. Think COO, CSR, etc. It seems difficult to let the backend decide if one or multiple representations are used if they all have to conform to one SparseTensorOps trait. Even worse, if we specify the format at the tensor kind instead of backend, now every backend has to implement all required representations. I can imagine this would get verbose quite quickly.
  3. Backend Bridges. These seem on the surface to be the best approach because, much like the FullPrecision bridge, the amount of support for sparse tensors would be up to each backend, and they would have the highest amount of freedom. There are some troubling issues I ran into when I started actually trying to implement a backend bridge though:
    • A bridged backend still has to implement the normal tensor ops. This should be fine because sparse tensors are looking to merely have a different under the hood representation of the same user facing tensors, however certain operations like cosine are impractical on sparse tensors as they do not preserve zeros.
    • FloatTensorOps and the other traits like it lack functions unique to a sparse matrix. For example Sparse-Dense-Matrix-Multiplication is not a function in FloatTensorOps for good reason, but would be needed for the sparse backend we are bridging to.
    • I would really like Backend Bridges to work for this, but the limitations seem quite large unless I'm missing something. A bridge to some kind of SparseBackend instead of Backend might be achievable with a lot of work, but at that we would be better off just add a new tensor kind.

After having run into a bunch of walls with the backend bridges, I think my next approach will be to see if I can take the best of both of the first two approaches and combine them together. I suspect it may be possible to have the SparseXTensorOps and associated primitives with sensible default implementations that abstract over the corresponding FloatTensor, IntTensor and BoolTensor, but which can be overridden by backends such as libtorch which have their own sparse tensor implementations. Of course, only time and the type system will tell me if this is achievable...

nathanielsimard commented 2 months ago

I think a mix of 1 and 2 is the best approach, providing default implementations using the existing ops, but making it possible for backends to override those new sparse ops to improve performance.

McArthur-Alford commented 2 months ago

It has been a very great challenge trying to figure out how to make this work, but I've finally landed on something that works. I threw together a quick mock version in order to verify it, and it all seems sound, so I plan to start actually implementing this for the real back-ends soon.

Here is my small little mockup (names of things are very not final). Feedback or thoughts are welcome:

// ===== The trait definitions for sparse backend stuff + representations

trait ReformatSparseTensor<FROM: SparseRepr, TO: SparseRepr> {
    fn reformat(from: FROM) -> TO;
}

trait SparseTensorOps<B: SparseBackend, REPR: SparseRepr> {
    fn spdmm<const D: usize>(
        s: B::SparseTensorPrimitive<REPR, D>,
        d: <B::DenseBridge as Backend>::TensorPrimitive<D>,
    );
}

trait SparseBackend:
    SparseTensorOps<Self, Self::CSRRepr>
    + SparseTensorOps<Self, Self::COORepr>
    + Sized
    + 'static
    + ReformatSparseTensor<Self::COORepr, Self::CSRRepr>
    + ReformatSparseTensor<Self::CSRRepr, Self::COORepr>
{
    // Bridge back to the dense backend, so we can use its functionality
    type DenseBridge: Backend;

    // The different representations of our sparse tensors
    // SparseRepr is just a supertrait to guarantee some things less verbosely
    type CSRRepr: SparseRepr;
    type COORepr: SparseRepr;

    // The sparse tensor primitive type, generic over the Repr
    type SparseTensorPrimitive<REPR: SparseRepr, const D: usize>: Clone
        + Send
        + 'static
        + core::fmt::Debug;
}

trait Backend: Sized + 'static {
    type SparseBridge: SparseBackend<DenseBridge = Self>;
    type TensorPrimitive<const D: usize>: Clone + Send + 'static + core::fmt::Debug;
    // All the typical backend stuff here...
}

trait SparseRepr: core::fmt::Debug + Clone + Send + Sync + 'static {}

// ==== The *default* "decorator" sparse backend that abstracts over Bs tensor primitives

#[derive(Debug, Clone)]
struct DefaultCOORepr {
    // This would ideally utilize the backends tensor primitives, but that's too much for the mockup
    contents: u32,
}

#[derive(Debug, Clone)]
struct DefaultCSRRepr {
    // This would ideally utilize the backends tensor primitives, but that's too much for the mockup
    contents: i32,
}

impl SparseRepr for DefaultCOORepr {}
impl SparseRepr for DefaultCSRRepr {}

#[derive(Debug, Clone)]
struct DefaultSparseTensorPrimitive<REPR: SparseRepr, const D: usize> {
    contents: REPR,
}

struct SparseDecorator<B: Backend> {
    b: core::marker::PhantomData<B>,
}

impl<B: Backend> ReformatSparseTensor<DefaultCOORepr, DefaultCSRRepr> for SparseDecorator<B> {
    fn reformat(from: DefaultCOORepr) -> DefaultCSRRepr {
        DefaultCSRRepr {
            contents: from.contents as i32,
        }
    }
}

impl<B: Backend> ReformatSparseTensor<DefaultCSRRepr, DefaultCOORepr> for SparseDecorator<B> {
    fn reformat(from: DefaultCSRRepr) -> DefaultCOORepr {
        DefaultCOORepr {
            contents: from.contents as u32,
        }
    }
}

impl<B: Backend<SparseBridge = SparseDecorator<B>>, REPR: SparseRepr>
    SparseTensorOps<B::SparseBridge, REPR> for SparseDecorator<B>
{
    fn spdmm<const D: usize>(
        s: <B::SparseBridge as SparseBackend>::SparseTensorPrimitive<REPR, D>,
        d: <<Self as SparseBackend>::DenseBridge as Backend>::TensorPrimitive<D>,
    ) {
        println!("{:?} x {:?}", s, d);
    }
}

impl<B: Backend<SparseBridge = SparseDecorator<B>>> SparseBackend for SparseDecorator<B> {
    type DenseBridge = B;

    type CSRRepr = DefaultCSRRepr;

    type COORepr = DefaultCOORepr;

    type SparseTensorPrimitive<REPR: SparseRepr, const D: usize> =
        DefaultSparseTensorPrimitive<REPR, D>;
}

// ===== And testing it out:

struct TestBackend;
impl Backend for TestBackend {
    type SparseBridge = SparseDecorator<TestBackend>;

    type TensorPrimitive<const D: usize> = String;
}

fn main() {
    let dense_primitive: <TestBackend as Backend>::TensorPrimitive<1> =
        "Im a dense matrix!".to_string();
    let sparse_primitive: <<TestBackend as Backend>::SparseBridge as SparseBackend>::SparseTensorPrimitive<<<TestBackend as Backend>::SparseBridge as SparseBackend>::CSRRepr, 1> = DefaultSparseTensorPrimitive { contents: DefaultCSRRepr { contents: 12 } };

    dbg!(&dense_primitive);
    dbg!(&sparse_primitive);
    <<TestBackend as Backend>::SparseBridge as SparseTensorOps<
        <TestBackend as Backend>::SparseBridge,
        DefaultCSRRepr,
    >>::spdmm(sparse_primitive, dense_primitive);
}

I tried really hard to make the backend supertrait require implementation of SparseBackend, or something like it, but it ended up being impossible due to type system constraints (without becoming very verbose on the generics). The way I'm doing it now seems plenty extensible (any backend can define a custom SparseBridge, but we also have a handy decorator for backends that can implement sparse tensors without assuming anything about backend details.

The only thing I'm still a little concerned about is having the SparseBackend trait explicitly define the COO, CSR, etc associated types. There isn't an easy way to have a backend which has a custom representation other than misusing one of the existing associated types, or for a backend to only implement a subset of them (without having one of the ATs be meaningless).

nathanielsimard commented 2 months ago

@McArthur-Alford that's interesting. I think it would be better to define a single representation agnostic sparse backend, and use the backend bridge to change between the two. The traits work almost the same way you presented, but without any forced representation.

McArthur-Alford commented 2 months ago

@nathanielsimard I would like for it to work this way, but I'm struggling to figure out how to do it satisfyingly. The reason I have multiple representations enforced in the trait right now is there are meaningful differences between the costs/benefits of each representation. Having the backend specify its own representation sounds great, but ideally a backend allows multiple representations so users can choose the best one for the operation. I cant think of how to let each backend specify its own set of allowed representations whilst also providing easy functionality in the API to swap representations.

edit: I haven't really thought about it enough, but it is possible to extend the tensor API on a per-backend basis. Ill see if it isn't possible to do something like the below:

nathanielsimard commented 2 months ago

@McArthur-Alford I wouldn't force backends to actually implement the sparse backend trait; I would consider it to be an optional backend extension that should be opt-in. The same backend could still implement multiple representations by having a generic argument for that representation:

impl<F: FloatElement> SparseBackend for NdArray<F, SparseCsr> {
   ...
}
impl<F: FloatElement> SparseBackend for NdArray<F, SparseCoo> {
   ...
}

In the modeling code:

type MyBackend = NdArray<f32, SparseCsr>;
use burn::tensor::backend::SparseBackend as Backend;

fn inference<B: Backend>(input: Tensor<B, 2>) {
    let input: SparseTensor<B, 2> = input.into_sparse(); // Could be that
    let input: Tensor<B, 2, SparseFloat> = input.into_sparse(); // Or that
    ...
}

In addition, using backend bridges, one should be able to switch between backends, therefore representation, at runtime!

McArthur-Alford commented 1 month ago

Ok, so I'm making pretty decent progress on this. For the most part it doesn't require any substantial changes to the API because of the approach taken, though there is one change I figured would be convenient for reducing duplicated code (though potentially quite a significant change).

Right now the tensor API has BasicOps<B>. Sparse tensors can only implement some of these basic ops, so for the moment I've broken it up into a BasicOps (shared by all), BasicDenseOps (unique to dense) and BasicSparseOps (unique to sparse) just so that we have some more control there. I could realistically do it with a lot less changes by just having BasicOps and BasicSparseOps, but that would lead to a fair few functions in BasicOps being repeated in BasicSparseOps. I got everything compiling after splitting them up, but there is still quite a bit of reorganizing for me to do as I add sparse ops, so I figured Id check in on what is preferred.

Thoughts?

nathanielsimard commented 1 month ago

@McArthur-Alford I think it makes sense to create BasicOps and BasicDenseOps, however it's going to be a bit weird with the other traits not having Dense or Sparse implying that they are used by all! If we can refine a bit the names that would be a better, but I get the separation.