Inaccurate depth count after readError on "current" file

nsqio / go-diskqueue

A Go package providing a filesystem-backed FIFO queue

MIT License

467 stars 101 forks source link

Inaccurate depth count after readError on "current" file #29

Closed kev1n80 closed 3 years ago

kev1n80 commented 3 years ago

It seems like DiskQueue does not accurately update depth when a read error occurs (in handleReadError()) or in the scenario where the machine finishes reading a file, deletes a file, but crashes before writing to the MetaData file.

mreiferson commented 3 years ago

Yes, those edge cases are possible. What we do is perform checks when you reach the tail of the queue:

https://github.com/nsqio/go-diskqueue/blob/99f6193f1106cefd16bca4d5727a62a2a7c19c7d/diskqueue.go#L521-L536

kev1n80 commented 3 years ago

Hello, I'm not sure if I should make a new issue or respond here. I found that in the case where the user is reading all of the messages in the queue and the last file is corrupted, the depth never becomes 0. Within the if clause in line 579, the readFileNum == the writeFileNum and writePos is set to 0 as is the readPos later on in the function. After the readPos is set to 0, shouldn't checkTailCorruption be called or depth be set to 0 if readFileNum == writeFileNum?

https://github.com/nsqio/go-diskqueue/blob/99f6193f1106cefd16bca4d5727a62a2a7c19c7d/diskqueue.go#L579-L605

mreiferson commented 3 years ago

Yea, you're right, we should call d.checkTailCorruption(d.depth) at the end of handleReadError(), want to open a PR?

kev1n80 commented 3 years ago

Yes I would love to open a PR and add the change: call d.checkTailCorruption(d.depth) at the end of handleReadError(). Thank you!