C++ | Splitting work by output shape

nsch0e commented 1 year ago

This branch splits the problem by output shape. Features:

Threading rework
CacheReader with memory mapped file (less Memory usage)
Output files by output shape (less mem usage, b/c only one set by shape is filled)

Needs:

review and ideas for better iterators and CubeView which overlays cachefile without copying.
~depends on cache files! No independent runs from N=1 to N=X.~
no merge functionality for splitted cache files yet.

Benchmarks: from N-1

whole output in RAM: N=13 T=20: 141s (12700H, 16GB) [peak-mem: ~14GB?! don't remember correctly, but was swapping :) ]

only one shape in RAM: N=13 T=20: 165s (12700H, 16GB) [peak-mem: ~3GB] (slower because clearing hashy is expensive :) ) N=14 T=12: 2500s (AMD 3600, 32GB) [peak-mem: ~24GB] N=14 T=20: 1936s (12700H, 64GB) [peak-mem: ~24GB]

JATothrim commented 1 year ago

I pulled this on top of main branch and the git history looks bit tangled as the branch is older than mikepound:main I will try rebase it first to see if it can be merged cleaner. @nsch0e Do you have version of the branch before merging upstream changes?

nsch0e commented 1 year ago

@JATothrim Unfortunately I merged changes of you on the fly. My last rebase try did not result in the same endresult. Will try again...

nsch0e commented 1 year ago

@JATothrim this should be cleaner and it is almost the same result... :)

JATothrim commented 1 year ago

@JATothrim this should be cleaner and it is almost the same result... :)

This may be silly: cubes -n 2 -w -s fails. So for me this fails to compute N=2 I'm confused how I'm supposed to run this? It fails because none of cubes_*.bin files exist yet - but I can't it get to generate them?

I tried the manual rebase earlier before force push and in that process I saw that cubes.cpp had lot's of merge conflicts that could not be resolved with tool... has something gone missing?

nsch0e commented 1 year ago

I'm confused how I'm supposed to run this?

Yeah that's the problem of the moment. You need some cubes_N.bin file to process the next. There is no recursion anymore b/c we don't use Hashy as our base set.

I try to get recursion to work again, or at least to let cubes -n1 -w to write out its cachefile.

nsch0e commented 1 year ago

@JATothrim you should now be able to progressively generate cube_N.bin files from n=1.

cubes -n 1 -w
cubes -n 2 -w
...

JATothrim commented 1 year ago

I'm confused how I'm supposed to run this?

Yeah that's the problem of the moment. You need some cubes_N.bin file to process the next. There is no recursion anymore b/c we don't use Hashy as our base set.

I try to get recursion to work again, or at least to let cubes -n1 -w to write out its cachefile.

Aahh.. of course. So this is partial solution and entire part of further processing the split by-shape-cache-files is WIP?

nsch0e commented 1 year ago

Yes: I'm unsure how we handle split cachefiles...

FYI: I added back the recursive functionality now but it involves copying everything...

The current state can only process combined(all shapes for a N) cachefiles.
There is the split option to throw away the results of already completed shape buckets in hashy.
- if write_cache is enabled, then partial results will be saved in a partial cachefile
- each partial cachefile only contains one shape

Question is: do we always want to merge to a combined cachefile, or is it useful to always split cachefiles by shape.

Pro combined:

we only need one file and processing all of N+1 is easy on one node (pc)

Pro split cachefiles:

one shape of N only depends on four shapes of N-1
we could split computation to multiple nodes to calculate only one shape of the next N
- each node only needs 4 shapes of N-1, so we don't have to distribute the whole set of N-1 to all nodes

JATothrim commented 1 year ago

Yes: I'm unsure how we handle split cachefiles...

FYI: I added back the recursive functionality now but it involves copying everything...
* The current state can only process combined(all shapes for a N) cachefiles.

* There is the split option to throw away the results of already completed shape buckets in hashy.

  * if write_cache is enabled, then partial results will be saved in a partial cachefile
  * each partial cachefile only contains one shape
Question is: do we always want to merge to a combined cachefile, or is it useful to always split cachefiles by shape.

Pro combined:
* we only need one file and processing all of N+1 is easy on one node (pc)
Pro split cachefiles:
* one shape of N only depends on four shapes of N-1

* we could split computation to multiple nodes to calculate only one shape of the next N

  * each node only needs 4 shapes of N-1, so we don't have to distribute the whole set of N-1 to all nodes

Before we continue with this, I would like to see tidying the current state a bit since I'm seeing all kinds of funk accumulating. ;) The good: -The newCache.hpp can now handle the old way and new The funk:

Magic pointer offsets and address computation/type punning. If pointer is going to be read as XYZ then use XYZ* pointers in the first place.
Mixing C APIs with C++
The newCache.hpp structs/classes interfaces are laid-out such that I can't read what is the interface for code that uses them and what belongs to the implementation of the structs/classes.

The Cube currently owns its memory and CubeView is almost identical to it. There is reason why I didn't yet make cube use std::array: It might be good idea to make Cube support external memory directly, but this needs some thinking.

Also, if the XYZ data or cache files are ever going to be compressed someway, an proxy interface is needed that handles this. Since this CubeIterator thing is critical part of design I will try clean up that code a bit.

JATothrim commented 1 year ago

Could somebody test this PR and would merge make sense at this point? The split cache files can now be at least used by cubes -n 12 -w -c -s -u If I'm correct?

Either way, I don't know yet how following idea of me would fit into the project: I'm currently writing an "Cube Swapper" subsystem that I think would help with keeping the Cube XYZ data out of RAM, if we keep using hash-tables and some form of list representation of cube.

I would describe it as swap space for the XYZ data:

Cubes are first hooked into the swap system. (but only those instances where it makes sense)
Cubes are paged out to disk in batches to free RAM.
If Cube data is accessed is it transparently brought back from the swap space file. The only information needed for this is the address of Cube struct itself, data size and two extra bits for state.
Not an replacement for cache-files, the swap-file is not "re-usable" for next N+1 run.

While the Cube data is paged out it consumes no RAM except the struct itself. This would leave only the hash-table memory consumption as problem. Such system would be a lot slower than current implementation but it should be possible make it faster than relying OS doing the swapping, since we know what cubes are accessed from the hash-table and what aren't.

Above might be an dumb idea, so I ask does pursuing such system make sense at this point?

nsch0e commented 1 year ago

Could somebody test this PR and would merge make sense at this point? The split cache files can now be at least used by cubes -n 12 -w -c -s -u If I'm correct?

I think this is in a useful state. New flag -s splits output into separate files for each resulting shape and -u expects these files for N-1 when reading cache.

example:

# generate split cache files for N=8 
./cubes -n 8 -t 20 -w -s 
# use those cache files for generating split cache files for N=9
./cubes -n 9 -t 20 -w -s -c -u

bertie2 commented 1 year ago

I agree this is in a usable state, I would request https://github.com/nsch0e/opencubes/pull/19 get merged first so we don't spam the file system,, but otherwise I'm pretty happy with this. also might it be worth depreciating the old cache format soon? the new one just seems better and we could tar-gzip them to move them as a single file.

nsch0e commented 1 year ago

I merged the cache folder thing.

What do you mean with deprecating old cache format? The file format is still the same. The newCache can only read. Old cache is used for saving.

bertie2 commented 1 year ago

sorry for the delay, and misunderstanding on my part, I guess its more accurate to ask if we will be deprecating the old cache code path, having 2 blocks of code to do mostly the same thing just seems bad practice from a maintenance and ease of contribution perspective.

JATothrim commented 1 year ago

sorry for the delay, and misunderstanding on my part, I guess its more accurate to ask if we will be deprecating the old cache code path, having 2 blocks of code to do mostly the same thing just seems bad practice from a maintenance and ease of contribution perspective.

The newCache can't write to the disk yet so thats why the old cache code path still exists. I have now written somewhat ok implementation for mmap'ed file I/O library (both write and read support) so using that the the old code can be replaced. (still needs bit of work before PRs to make it mergeable) Should we do CacheWriter class etc. ?

EDIT: @bertie2 can you checkout my https://github.com/JATothrim/opencubes/tree/libmappedfile branch?

mikepound / opencubes

C++ | Splitting work by output shape #26