thomasmoelhave / tpie

Templated Portable I/O Environment
Other
112 stars 24 forks source link

peek_back for file_stream and peek for merge_sorter #239

Closed SSoelvsten closed 4 years ago

SSoelvsten commented 4 years ago

Hi,

I am currently using TPIE for a research project trying to create a Cache-oblivious implementation of a data structure extensively used in the field of verification and synthesis. The output of one algorithm immediately is in sorted order of the next. Or, at least almost... they are in reverse order of what is needed next. So currently, all algorithms go to the end of the input stream(s) and work themselves backwards while they write to the given output stream(s).

The code quality of my algorithms (and by extension my trust in them) would be heavily improved if there exists a

It would be easy for me to do this merely with my own wrapper classes, but I would rather contribute back to this project if possible and would offer to implement these myself. But, I'm not sure in what files to start. Looking at tpie/file_stream.h and other files I cannot figure out where exactly the implementation of the file_stream is. Neither can I find the merge_sorter implementation.

antialize commented 4 years ago

I would think that you would want to use the tpie pipelining framework (https://users-cs.au.dk/rav/tpie/doc/master/pipelining.html). Here if the output of one part say A is in the opposite order of what is needed in the next say B you can just push the elements through a reverser (https://users-cs.au.dk/rav/tpie/doc/master/namespacetpie_1_1pipelining.html#aff47d9407963e2e003e86a626b3a2a45).Like A() | reverser() | B().  When pulling in data instead of it being pushed too you, you can use the pull_peak adaptor (https://users-cs.au.dk/rav/tpie/doc/master/namespacetpie_1_1pipelining.html#a8635eea4a7a572a8d5f14c1087d266b9) to buffer one element that you can then peek() before you pull(). You can look at tpie/apps/pipelining_passive_sorter/pipelining_passive_sorter.cpp for an example of how to poll from a server.

SSoelvsten commented 4 years ago

I was recommended by @Mortal to start without the use of the pipelining framework first, which is how I ended at this question. Given that the pipelining framework already solves all of this, it might not be worth it for me or for anyone. As far as I understand the pipelining framework even is able to automatically parallelise multiple branches of computation by analysing the dependency graph, so that would also result in a major possible speedup?

antialize commented 4 years ago

There is no automatic paralization, but it is true that you can parallelize a subgraph for instance like tp::sorter() | tp::parallel( A() | B () | C() ) | tp::sorter()

SSoelvsten commented 4 years ago

Ah, that pretty much does it as well. Thank you for your time and thanks for the clarifications.