thomasmoelhave / tpie

Templated Portable I/O Environment
Other
112 stars 24 forks source link

I/O efficient deque #235

Closed gijsde1ste closed 4 years ago

gijsde1ste commented 4 years ago

Hi all,

Not really an issue with the existing code base, but I don't know a better place to post this. I'm still very new to tpie so there are probably some obvious things I don't know yet.

I'm attempting to create an I/O efficient deque. The basic idea is to use two I/O efficient stacks to represent both ends of the queue, each with its own file stream (copy pasted from stack.h). This works for deques that never pop elements added from the other side. I've managed to fix this for deques that only use the buffer by maintaining positions how far the pops from one side have read into the buffer of the other side. However I'm unable to find a solution for deques that actually write to stream. Is it possible when popping the back stack until its empty to continue by reading the stream of the front stack? I've attached my attempt so far but I'm unsure whether this approach has any chance of success or if there are easier/better ways to do this.

deque.zip

Assuming loads of elements are added to the front of the deque and then we want to pop them all from the back of the deque. The popBack checks if there's elements in its own buffer, if so use that. Else check if elements have been written to its streams, if so use that. When these fail it should read elements written in the frontStream. However the position is at the end of the stream, so I've made a canRead that moves the position of the stream back to the start and checks if it can be read. That seems to work, but when reading I get the error that the stream is not open.

Thanks in advance.

antialize commented 4 years ago

If you are ok with an amortized solution, you can just implement the deque as two stacks, representing either end of the queue, then pushing and popping to either end is easy. The only problem is what to do if you try to pop from an empty stack, in this case you can use linear IO to split the other stack into two parts roughly equal parts. This is quite inefficient for small stacks, so it would probably make sense to also keep an internal deque inside that is used for when there are few items.

antialize commented 4 years ago

Splitting can be done using read_back and truncate. In order for this to be I/O efficient you need to buffer B elements internally at either end like we do in the stack.

gijsde1ste commented 4 years ago

Hi,

Thanks for your input, that's a much easier approach. I've got it working but I'm not quite happy with the amount of copying I'm doing. The split uses linear IO but the constant factor is 1.5 while intuitions says it should be about 0.5.

My example fills the front of the deque, when popping items from the back it splits the frontStream. The pointer is at the back of the frontStream. I read through roughly half of the records in the frontStream and write them to a temporary stream, note that these are now in reverse order. Then the second half of the frontStream is read and written to the backStream (this is correct). Then I need to read the records from the temporary stream and write them back to the frontStream, this again reverses the order which puts them in their original order, which is correct.

My question is if this can be done simpler, without a temporary stream. Like moving the reading pointer to roughly halfway, only copying the necessary elements to the other stream and then moving the remaining elements back to the front of the stream. I wasn't able to find any methods in the streams that could support any of this.

antialize commented 4 years ago

If you want to use compressed streams then this is the best approach I can come up with for splitting:

Given the compressed stream A, that we want to split, first read and ignore from A, from location zero to N/2. Then from this location read backwards while putting the elements into a new stream B. This will do scan(N) reads and scan(N/2) writes. We can keep using A as one of the halfs, as long as we remember that we should disregard the bottom half.

For uncompressed streams, it is much easier since we can just seek to N/2 and readback to 0. So we get scan(N/2) reads and scan(N/2) writes.

gijsde1ste commented 4 years ago

Thanks for helping me solve this

closing issue