lumol-org / soa-derive

Array of Struct to Struct of Array helpers in Rust
Apache License 2.0
417 stars 28 forks source link

Use one length and capacity variable for whole struct #19

Open boomshroom opened 5 years ago

boomshroom commented 5 years ago

As it stands, this crate appears to make separate fields each with their own Vec. This would duplicate the length and capacity values for each field. This may not be a huge problem, but it would triple the size of the struct as it grows and can lead to the different Vecs falling out of sync.

The main alternative would be to use unsafe and raw pointers (or NonNull pointers). That said, managing the unsafe would almost certainly be more effort than keeping the Vecs in sync. Regardless, I think this could be a useful discussion to be had. Is cutting the struct to a third worth having unsafe code to vet? (probably not)

Luthaf commented 5 years ago

Yes, this is a known limitation of this crate. I think it would be worthwhile to have a single pointer for each member of the struct, plus two usize (len & capacity). I hope implementing SoA with pointer would help in removing some bound checks.

having unsafe code to vet?

We could re-use part of std::vec::Vec implementation here, adapating it as needed. But switching to unsafe code would mean that we would need way more testing on this crate.

newpavlov commented 5 years ago

Additional advantage will be that compiler will know that length of each field is always equal to each other, which may significantly help with optimizations.

mangelats commented 4 years ago

Hey, I've been trying to make something similar for a few days.

Using the experimental RawVec instead of Vec and having a length field shouldn't be too difficult and would allow to have a single len. You are already mimicking Vec interface and most (if not all) unsafe code in Vec seems to be related with using a raw pointers to a buffer as a parameter; which I don't think is relevant enough when you are using it for a SoA since you have multiple buffers anyway.

The other option would be to make some kind of RawVec but having to multiple raw buffers simultaneously allowing to have a single cap. It would be more difficulty since it's very low level (working directly with allocators) and would require extensive use of unsafe and experimental features. Also need to be aware of corner cases like zero sized objects. On top of that they seem to be improving RawVec on the main Rust crate as well as allocators.

The obvious downside of both is that they are based on experimental API. Maybe it could be enabled with a feature. If you are fine with it, I'll try to make the a version using RawVec instead of Vec and make a PR :smile:

Luthaf commented 4 years ago

By RawVec you mean using the one in alloc, right? It looks like it is unstable with no path for stabilization, meaning using this would lock soa-derive on nightly.

I think I would rather keep this crate compatible with stable Rust, since one of my goals with it is writing software that's distributed as source and compiled by non programmers. Having to explain the differences between stable & nightly would be more work on top of explaining them they need a Rust compiler.

One possible way forward would be to use an alternative RawVec from a crate, potentially copy-pasted from rust's alloc. If such crate do not yet exists, creating it should be easy, and would allow us to use it with stable Rust.

If the differences between standard Vec and RawVec versions of the code can be kept minimal, I would also be fine with having RawVec enabled with a feature. This would also allow testing the initial assumption that having a single len field for all buffers gives us some performance gains.

mangelats commented 4 years ago

That's the one. RawVec uses parts that are even more unstable than itself... so I don't think copy-pasting would be of any help.

I'm intrigued to see if the compiler is smart enough to leverage the fact to have the same length for all the vectors. I'll make a feature to enable it and make a PR.

pickfire commented 4 years ago

Oh, I thought this is the default but looks like it is not, I wonder why people would use this rather typing out the Vec themselves?

It could be done on stable by using NonNull probably, keeeping the pointers seperately.

Luthaf commented 4 years ago

The advantage of soa_derive is on the API side: being able to consider a XXXVec exactly like a Vec, and use vec.push(e: XXX) and all other Vec methods directly.

One could type the Vec and forward to the methods directly, but this would be high boilerplate code.

Le 3 août 2020 à 04:14, Ivan Tham notifications@github.com a écrit :

Oh, I thought this is the default but looks like it is not, I wonder why people would use this rather typing out the Vec themselves?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mangelats commented 4 years ago

A bit of an update now that I've spend a few hours implementing a single length version.

An advantage that we didn't discuss is that having a single length makes it impossible to have the underlying arrays unsynchronized. With the current version it is as simple as vec.field.push(value) to desynchronise them. Which leads me the the next point: it'd be better if the user would not have access to the underlying structure (currently Vec). This would ensure that the user is not calling things it should not as well as allowing us to change the underlying logic without any change by the users.

Right now, the tests rely on having access the underlying Vec. This makes it not work with any other implementation. The workaround I found is to call as_slice every time but it's quite verbose and I feel like we should test it more in the way that people will use it. Also the current version have a lot of functions. I'd split them into "core" and "extra" so we can make a first version of the new implementation without implementing every single function (most of them need to be implemented by hand).

So, what I would do is:

  1. Make another issue to define and improve the interface (like #24).
  2. Remake tests so they don't depend on the fact that you can access the underlying Vecs.
  3. Make and try the new implementation.

This is quite some work but I feel it would be worth it. How do you feel about it?

Luthaf commented 4 years ago

I agree that preventing desynchronization is good, as long as we can still provide access to individual slices with or without soa_zip!.

Right now, the tests rely on having access the underlying Vec. This makes it not work with any other implementation.

Do you have a specific example of this? They should work fine with slice access, not full Vec. If they don't, I agree they need fixing =)

Also the current version have a lot of functions. I'd split them into "core" and "extra" so we can make a first version of the new implementation without implementing every single function (most of them need to be implemented by hand).

Part of the appeal of soa_derive is to get access to all Vec functions without writing to code manually. I fear that splitting between core/extra would break this. At the same time, I see that re-implementing everything would be quite a lot of work, so how about creating a branch out of master to work on this new interface? This allow piecewise changes and integration in other projects with git dependencies, while still landing the full interface in master.

Make another issue to define and improve the interface (like #24).

I'm not sure I see the link between #24 and this, could you elaborate?


let me know if I'm unclear!

pickfire commented 4 years ago

@copying Can I see the rough plan of the new expanded structure? Using one length should be easier compared to using one capacity since the allocator should align it according to the size.

mangelats commented 4 years ago

Sorry @Luthaf, I think I explained myself poorly.

The tests are using the underlying Vec. Getting slices (using as_slice) works just fine but makes the tests more verbose.

Also the current version have a lot of functions. I'd split them into "core" and "extra" so we can make a first version of the new implementation without implementing every single function (most of them need to be implemented by hand).

Here I was referring to the tests. Dumb me didn't realise that I didn't write it. Right now most tests are in the same file which means that to test only the "core" parts you have to specify all the test functions manually. Splitting in 2 makes testing the core much simpler. I also like the idea of having a branch with less tests. And I absolutely agree that having all the other functions is good and helps programmers.

Make another issue to define and improve the interface (like #24).

The point here is that there can be improvements to the interface and might be better to do some before finishing the other implementation.

Also, reading everything again it feels like I've not implemented almost anything. Actually most code is done in a branch called single-length on my fork. I'm currently making it so expect bugs, tests not working, and dirty git history. That version allows you to swap between single length with a feature: add --features "use_raw_vec" when calling cargo to swap it =)

mangelats commented 4 years ago

@pickfire Right now the structure I'm working with is:

struct ExampleVec {
  field1: RawVec<T1>,
  field2: RawVec<T2>,
  len: usize
}

This structure doesn't work with a structure that has a field named len. So I'm actually thinking about switching it to:


struct ExampleVecFields {
  pub field1: RawVec<T1>,
  pub field2: RawVec<T2>
}

struct ExampleVec {
  fields: ExampleVecFields,
  len: usize
}
pickfire commented 4 years ago

Can derive even change it into this?

struct ExampleVecFields {
  pub field1: RawVec<T1>,
  pub field2: RawVec<T2>
}

struct ExampleVec {
  fields: ExampleVecFields,
  len: usize
}
mangelats commented 4 years ago

We use procedural macros so you can inject as much code as you want :)

PD: This package don't change the original structure, it only adds more code

Luthaf commented 3 years ago

@iMplode-nZ moving the discussion on single length here.


Hm for the single length thing perhaps just make the storage generic and use a HList?

Originally posted by @iMplode-nZ in https://github.com/lumol-org/soa-derive/issues/35#issuecomment-850070340


I really don't see how a HList would help here, could you clarify?

Originally posted by @Luthaf in https://github.com/lumol-org/soa-derive/issues/35#issuecomment-850257345


So for using a HList, your #[derive(StructOfArray)] would effectively create this:

struct Foo {
  x: i32,
  y: String,
}

impl From<HList<String, HList<i32, HNil>> for Foo

impl From<Foo> for HList<String, HList<i32, HNil>>

then the vector type would simply be something like this:

struct SOAStorage<T: Into<HList...> + From<HList...>> {
  vec: Vec<T::Head>
  next: SOAStorage<T::Tail>
}

(Or something like this, this doesn't compile but you probably get the point right?)

But this way you only need to implement a single trait and then just use SOAStorage<Foo> for all other cases. Also supports having custom storages like one that stores length once.

Originally posted by @iMplode-nZ in https://github.com/lumol-org/soa-derive/issues/35#issuecomment-850532126

Luthaf commented 3 years ago

Except that in your example, since SOAStorage contains a Vec, any SOAStorage containing more than one field will contain the length multiple time.

We could have a FirstFieldSoaStorage and OtherFieldsSoaStorage for the other fields, with OtherFieldsSoaStorage containing only pointer + capacity, but I fail to see the improvement compared to something like what @copying is working on:

struct Foo {
  x: i32,
  y: String,
}

struct RawVec<T> {
    ptr: mut* T,
    capacity: usize,
}

struct FooVec {
    len: usize,
    x: RawVec<i32>,
    y: RawVec<String>,
}

In both cases, we need to manually deal with unsafe code. I also personally find the RawVec version easier to read/understand/generate in a proc macro.

pickfire commented 3 years ago

I believe https://docs.rs/soak is doing this already since it have a different mechanism.

entropylost commented 3 years ago

Well or make your SOAStorage contain a SOAElementStorage hierachy, where the SOAStorage has the head and the size and each of the SOAElementStorages contain the raw pointer and the next one.

mangelats commented 2 years ago

Hello again! I'm sorry that I've been off for such a long time, but I finally have the time to be able to do stuff!

I didn't have the time to sit down and write what I've been thinking until now, but I've made this proof of concept that shows how I think would be a nice way of handling things. It doesn't cantain any macro and some code has been generated by hand, but I think it's easy to see how it could be done. Here is a fast explanation of what I've done:

Instead of making a vector structure each time, I've made a generic vector where you can set any buffer that implements the required traits:

  1. AssociatedMemoryTypes<T> ensures that you can use any type for pointers, references, and slices as long as you define the necessary methods. The idea is to have a uniform interface for manipulating this kind of types.
  2. Buffer<T> has the actual methods of the buffer, where the actual logic resides. 3.. Default ensures that it's possible to make a simple new. This is a way to get a PoC faster.

AssociatedMemoryTypes<T> makes use of generic associated types for lifetimes, which still is unstable (see issue) but seems to work fine.

After that I made 2 test files:

  1. Makes a simple non-exaustive (and probably bad) RawVec and implemented the necessary traits. I made a unit test where it pushes and pops some values.
  2. Makes an SoA structure based on a 3D point. It uses MyRawVec from the previous test, as it has very limited logic (mostly calling function with the same name trice and wrapping the results when necessary).

Note that the SoA implementation relies on the fact that its composed of buffers. This buffers could actually be another SoA, giving nested vectors for "free". Also note that there is no restriction stating that the buffer should be able to resize: it should be possible to allocate a large slap once and fail if ever try to resize.

One thing I find nice about this approach is that the code that would be generated contains very little logic: most code would be in the traits or base buffers.

Let me know what you think!

mangelats commented 2 years ago

A small update: I've updated the PoC. I added support for fixed size array as buffers which was easy enough (so that would also solve #46 ). Also I've moved and renamed the code a bit.

I also tried to use the built-in RawVec, but it has been disabled and it doesn't seem like it will come in the foreseeable future. I could use a library that implements something similar, but I'm not sure if that's a good idea.

I'm not sure what the best next step would be. I'm thinking either doing the actual macro or add more methods to the vector, but I'm not too sure.

Luthaf commented 2 years ago

Hey @copying! Thanks for working on this. Unfortunately, I don't have a lot of time to spend on this crate these days. I'll try to give a look at your prototype ASAP, hopefully within a week or two!

mangelats commented 2 years ago

Hey @Luthaf, it took me 2 years to come back, so don't feel too much pressure to read this or to look into the PoC quickly. Also, I'll have some vacation days this week, so I'll probably invest some time to this project! :)

I realized that this thread is more about a collection of ideas but I haven't actually explained what I've done in the proof of concept, so here it is.

The PoC parts

Note: this PoC is very minimal, and we may find some case that breaks this scheme. Also, naming is hard; I would really appreciate any suggestions to improve it :)

  1. The Vector struct: a buffer agnostic minimal vector. Uses a Buffer<T> to save its data.
  2. The Buffer<T> trait: uniform interface for a raw-vec/buffer allocation, de-allocation, and moving memory. All raw-vecs need to implement it (including the SoA ones).
  3. The MemoryPrimitives<T> trait: it defines the base structures used as primitives, as well as all the required operations that are used.
  4. Basic raw-vecs (CustomRawVec and ArrayRawVec) which are for a single value. This can be used to compose more complex ones.
  5. The markers: allow to change the underlying buffer in complex structures (new since my last comment).

Composition

One of the strengths of this PoC is it's ability to compose the different parts.

// Tries to be equivalent to Vec<Point>
type RegularPointVec = Vector<Point, CustomRawVec<Point>>;

// Using a fixed-size array
type ArrayPointVec = Vector<Point, ArrayRawVec<Point, 1000>>;

// Using a single, non-configurable raw-vec
type SoaPointVec = Vector<Point, SoaPointRawVec>;

// Using a generic raw-vec with different base raw-vecs (defined with markers)
type RegularSoaPointVec = Vector<Point, GenericSoaPointRawVec<CustomRawVecMarker>>;
type RegularSoaPointVec = Vector<Point, GenericSoaPointRawVec<ArrayRawVecMarker<1000>>>;
type RegularSoaPointVec = Vector<Point, GenericSoaPointRawVec<SomeThirdPartyVecMarker>>;

This composition allows to recreate the Vec code, make weirder layouts like SoA, or AoSoA. Even fancier layouts may be possible (it look awfully close to an allocation system).

Vector

One of the core concepts is that a vector code only changed due to the changes in the underlying raw vector (buffer), and not the logic itself. This means that all the Vector's code and functionality can be expressed for a generic buffer type. This also means that the vector functions don't need to be generated. The resutilg structure is quite simple:

pub struct Vector<T, Buf: Buffer<T>> {
  buf: Buf,
  len: usize,
  _marker: PhantomData<T>,
}

Buffer

The trait that defines the interface used by Vector to manage memory: reserve, shrink (to be implemented), and read data like getting the initial pointer.

MemoryPrimitives

This trait defines the underlying types and functions that the raw-vec will use to manage memory. This intends to unify the interface so a regular pointer and a SoA pointer can be called using the same code. This is simple but quite tedious to implement, as there are a lot of functions. Because of that, I created the trait UsesStandardMemoryPrimitives, which automatically implements MemoryPrimitives<T> by using the standard set of types (regular pointers, references, and slices).

Basic raw-vecs

CustomRawVec<T> is a worse version of RawVec used in the standard which is no longer accessible. ArrayRawVec<T, CAP> is a fixed size raw-vec which cannot change its capacity. Others could be implements by, for example, using a library.

Markers

I wanted a way to define what the underlying raw-vec for a possibly-nested SoA was. I though about using generic types but the type as argument needs to be fully qualified. When dealing with nested SoA raw-vecs, we cannot pass it fully qualified, as we do not now the underlying types. This is what I would like to do:

Vector<Point, GenericSoaPointRawVec<SomeThirdPartyVec<???>>

This is not directly possible, so this is the trick I found which allows to bypass it by using generic associated types together with zero-sized structures:

// Trait implementation
pub trait BaseRawVecMarker: 'static {
  type RawVec<T>: Buffer<T>;
}

// Use example
pub struct ArrayRawVecMarker<const CAP: usize>;
impl<const CAP: usize> BaseRawVecMarker for ArrayRawVecMarker<CAP> {
  type RawVec<T> = ArrayRawVec<T, CAP>;
}

// Usage
type MyVec = GenericSoaPointRawVec<ArrayRawVecMarker<1000>>;

Notice how we no longer need to define T in the marker itself.

"Generated" code

In this PoC I've not created any macro yet. So I've basically generated the code by hand. That being said, it's easy to see the the amount of code will be reduced, as the raw vectors have much less necessary functions. The complexity of such functions is the same if not less than with the original code :)

PS: If you have any questions or you feel that something is not clear let me know.

mangelats commented 1 year ago

Hey! It's been a year since my last time writting here so here's what I've done since then.

I've made a new repository with the PoC without the SoA code. The Vector doesn't have much methods yet (the focus was finding the right abstraction for buffer) but the potential is awesome. I've been able to make the buffer a compoiste, which allows to add optimization when you want them.

I've also made a small vector optimization (SVO) which is not in the standard. Haven't mesured anything yet since it's not finished nor optimized, but the hability to even allowing something like this gives me a lot of hope!

@Luthaf I'm guessing that you are still very occupied, but I'd love to have your input at this point. Is there something I could do to make it simple/fast to take a look at it? Maybe some examples or something?