Alternate clause layouts

xldenis commented 2 years ago

I feel like it might be a good idea to experiment with using smallvec in the definition of Clause.

Alternately, I wonder if it is possible define a custom unsized type in Rust so that we can layout the clause literals inline with a header.

sarsko commented 2 years ago

Hmm I don't see what problem would be solved by using smallvec. Also in general not a fan of using external crates as I feel the trust/gain tradeoff is usually not worth it.

A better clause/clause db layout is something I would like to experiment with. I have not fully decided on the overall clause database organisation, so I do to some degree want to do that first. Now all clauses are in one formula, and separated on s.inital_len. I think a split into learnts and original or irredundant and redundant should be better.

That being said, alternative clause representations can be experimented with separately of the overall clause db representations. Also I think proof-wise "deep-model" would help in this, as it would allow to write specs on Seq<Seq<Lit>> (which I believe is a correct model for a clause DB no matter its representation/the representation for a Clause), rather than Seq<Vec<Lit>>

Also some other Rust clause/clause db implementations:

xldenis commented 2 years ago

Hmm I don't see what problem would be solved by using smallvec.

It will store the data inline under a certain size rather than performing a seperate allocation.

Also in general not a fan of using external crates as I feel the trust/gain tradeoff is usually not worth it.

Eh, its no worse than using Vec, we've even proved the model of SmallVec using RustHornBelt.

sarsko commented 2 years ago

Hmm I don't see what problem would be solved by using smallvec.

It will store the data inline under a certain size rather than performing a seperate allocation.

Yes, I know. I don't see when that is useful? I think the good design is to have the clause db be a 1d vec with manual handling of clauses as a header and lits. I dunno, might be useful when operating with temporary clauses, i.e. in clause learning, but I don't think the good solution will have many temporary clauses.

Also in general not a fan of using external crates as I feel the trust/gain tradeoff is usually not worth it.

Eh, its no worse than using Vec, we've even proved the model of SmallVec using RustHornBelt.

Well if so it is fine.

xldenis commented 2 years ago

I think the good design is to have the clause db be a 1d vec with manual handling of clauses as a header and lits.

This is the part i don't know how to properly do in Rust, that would require defining an unsized type since clauses wouldn't have fixed size. It will also 100% require unsafe code somewhere as a result, and not the kind Creusot can handle.

On the other hand smallvec can get us 80% of the benefit for 10% of the effort. If in 90% of the case we avoid having a pointer indirection and that leads to faster loads / lookups, that's a win.

sarsko commented 2 years ago

I think the good design is to have the clause db be a 1d vec with manual handling of clauses as a header and lits.

This is the part i don't know how to properly do in Rust, that would require defining an unsized type since clauses wouldn't have fixed size. It will also 100% require unsafe code somewhere as a result, and not the kind Creusot can handle.

Well the clause db would become either a Vec<usize> or a Vec<u32> where each usize/u32 are either a lit representation, or a part of the clause header or the clause metadata (number of lits). This requires unsafe of the messing with bits kind.

On the other hand smallvec can get us 80% of the benefit for 10% of the effort. If in 90% of the case we avoid having a pointer indirection and that leads to faster loads / lookups, that's a win.

Will smallvec store them inline on the heap though? If it does it, I am all for it, but as far as I understand, the formula would be a vector of smallvecs, where each smallvec is either alloced on the stack, thus equivalent with an array, or alloced on the heap, and thus equivalent with a regular vector. What is desired is to have them inline in the formula vector, but I don't know if that would ever happen ?

xldenis commented 2 years ago

Right now things look something like this


                                ┌─────┐
                                │ l_0 │
                                ├─────┤
                        ┌──────►│     │
        ┌────────┐      │       ├─────┤
        │ c_0    │ ─────┘       │ l_i │
        ├────────┤              └─────┘
        │        │
        ├────────┤
        │        │
        ├────────┤
        │        │
        ├────────┤
        │        ├───────┐      ┌─────┐
        ├────────┤       │      │ l_0 │
        │        │       │      ├─────┤
        ├────────┤       │      │     │
        │        │       └─────►├─────┤
        ├────────┤              │     │
        │  c_n   │              ├─────┤
        └────────┘              │ l_j │
                                └─────┘

simply adding small vec may store entire clauses inline in the formula which can potentially speed things up by removing the need to follow pointers (loading contiguous memory is much faster than seperate loads).

If we just add a ClauseDB but keep the same layout then we get soemthing like this instead:


                                                    ┌──────┐
                                                    │      │
                                 ┌──────┐           ├──────┤
                                 │      ├──────────►│      │
                                 ├──────┤           ├──────┤
                     ┌──────────►│      │           │      │
                     │           ├──────┤           └──────┘
     ┌────────┐      │           │      │
     │ c_0    ├──────┘           │      │
     ├────────┤                  │      │
     │        ├────────┐         │      │
     ├────────┤        │         │      │
     │        │        │   ┌────►│   .  │           ┌──────┐
     ├────────┤        │   │     │   .  │           │      │
     │        ├─────┐  │   │     │   .  │           ├──────┤
     ├────────┤     │  │   │     │      ├──────────►│      │
     │        │     │  └───┼────►│      │           ├──────┤
     ├────────┤     │      │     │      │           │      │
     │        │     │      │     │      │           └──────┘
     ├────────┤     └──────┼────►│      │
     │        │            │     │      │
     ├────────┤            │     │      │
     │  c_n   ├────────────┘     │      │
     └────────┘                  │      │
                                 │      │
                                 │      │
                                 │      │           ┌──────┐
                                 │      │           │      │
                                 │      │           ├──────┤
                                 ├──────┤      ┌───►│      │
                                 │      ├──────┘    ├──────┤
                                 ├──────┤           │      │
                                 │      │           └──────┘
                                 └──────┘

Which seems like a loss unless we can store clauses flat with their literals (which will not be easy in rust i believe). In any case, the nice thing about smallvec is that it's drop-in for Vec. We can just do the test and see if we gain any perf and move on from there. Do you have a command to perform performance benchmarks? If so I can do it myself.

sarsko commented 2 years ago

If we just add a ClauseDB but keep the same layout then we get soemthing like this instead:

It seems like we might be talking past each other. ClauseDB = formula, and does thus not add another level of indirection.

Which seems like a loss unless we can store clauses flat with their literals (which will not be easy in rust i believe).

Starlit, Varisat and Minisat-rust do this (links above). See for instance this comment in Starlit

In any case, the nice thing about smallvec is that it's drop-in for Vec. We can just do the test and see if we gain any perf and move on from there. Do you have a command to perform performance benchmarks? If so I can do it myself.

I use the tests (yeah, I should move to using cargo bench). I would try running against (u)uf200 or (u)uf225 for random 3SAT of 200/225 clauses.

xldenis commented 2 years ago

It seems like we might be talking past each other. ClauseDB = formula, and does thus not add another level of indirection.

I was using ClauseDB to refer specifically to an arena allocated setup but you're right that we wouldn't need a seperate formula vector then.

Starlit, Varisat and Minisat-rust do this (links above). See for instance this comment in Starlit

I would like to avoid doing this: I would much, much perfer to compose safe abstractions than have to write so much unsafe code. You were worried about using dependencies, this is what worries me. Finding some way to express this using existing tools and techniques would be ideal.

sarsko / CreuSAT

Alternate clause layouts #12