thomasmoelhave / tpie

Templated Portable I/O Environment
Other
112 stars 24 forks source link

Sorting of tpie::file<T>::stream #241

Closed SSoelvsten closed 2 years ago

SSoelvsten commented 3 years ago

I'm getting very far with the core I/O-efficient BDD library, and currently I'm improving its API based on the external memory queue example in the documentation since it allows me to minimise the memory footprint by reusing tpie::file<T>s for trivially isomorphic instances. All the algorithms then create their own tpie::file<T>::streams to read from these (which also makes the library trivially thread-safe?).

In two edge-cases I have to sort some of these tpie::file<T>, but TPIE does not seem to provide a tpie::sort that may take a tpie::file<T>::stream reference as an argument. In general only a very few of the elements are out of order, so I could of course create two tpie::file_stream<T> objects for out-of-order and in-order elements, sort the out-of-order list, and then copy them over into the output file. This may run faster, since one sorts fewer elements and then quickly merges two streams. Based on the documentation though, this does require twice the amount of space than just sorting all elements.

SSoelvsten commented 3 years ago

I ran a quick performance benchmark of the tpie::file<T>::stream vs. the tpie::file_stream<T> with the following results.

tpie::file_stream<T> tpie::file<T> (stack) tpie::file<T> (heap)
writing (ms) 269 1725 1821
reading (ms) 245 211 213

So, if one is able to construct their program in a way, where you only need a single-write-only access and multi-read-only access later (as I do), then one should not sort a given tpie::file<T>, but instead open the same file with a tpie::file_stream<T> and sort it that way. To make both of these work with eachother one can make use of the tpie::temp_file object as the atomic underlying representation of each and every file.

For me, the above closes the issue. Whether this still should be kept as a feature request I leave for you to decide.

SSoelvsten commented 2 years ago

As far as I can tell, version 1.2 of TPIE was supposed to have tpie::file::stream removed? There is also still the unresolved issue https://github.com/thomasmoelhave/tpie/issues/129 on this very issue. Given this, this feature request seems irrelevant.