Open MichaelMure opened 4 years ago
Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:
Finally, remember to use https://discuss.ipfs.io if you just need general support.
Btw, this is with go-ipfs 0.4.22 so v0.0.1 of this package, but that's the same github.com/whyrusleeping/chunker
as master.
@MichaelMure are you by chance able to share an input ( publicly or privately ) that reliably reproduces this? Or is the problem intermittent and the same input might hang a daemon once, and then pass through just fine on rerun?
Unfortunately, that's all the information I have. It seems to be fairly rare, but to be fair, not many people play with the Rabin chunker.
@MichaelMure understood. It definitely sounds like a missing underflow protection, but it will take me another couple days to get my test setup working well to pin this down. I should have something for you Thu-ish
Note: our current rabin implementation is terrible. I wouldn't waste too much time trying to fix it.
The buzhash implementation in go-ipfs master should work much better.
Will it get deprecated/dropped ? Because if someone can take out our nodes with a bad request, that's not so fun.
Probably not? It's just a low-priority issue given that we support an additional chunker and are working on yet another chunker.
If you need it fixed now, I'm happy to accept a patch. However, the HTTP API is not designed to be a public API so this issue is significantly less critical for the IPFS team.
I understand the low priority. My question is more if you plan to retire Rabin once Buzhash is there for good.
@MichaelMure there's a sustained effort over the next ~month to better quantify what exactly do we want from a CDC, and where our actual bottlenecks are. Please track the main text of https://github.com/ipfs/specs/issues/227#issue-532891529 for further updates.
@MichaelMure just a quick update on this ticket: this is almost certainly a ( but not 100% proven ) bug in the way the rabin chunker implementation is reading from the stream, resulting in an unexpected under-read and thus a loop for data that never comes.
Since this issue got opened originally, we have developed a different subsystem providing a unified read-interface for the rest of the ingestion pipeline, namely https://pkg.go.dev/github.com/ipfs/go-qringbuf@v0.0.0-20200430132832-6a3bc585b0d9?tab=doc#pkg-overview
This bug will be solved in the process of migrating to that system.
Awesome thanks for the update :)
At Infura we got some of our nodes ending in an infinite loop in the Rabin chunker, with the CPU at 100% until something bad happen (the context doesn't reach there so loop keep spinning even though the request is long gone).
Here is a profiling:
Original requests are
add
with the following parameters:[chunker=rabin-262144-524288-1048576,encoding=json,hash=sha2-256,inline-limit=32,pin=false,progress=true,stream-channels=true,]
. Some also havetrickle=true
but that doesn't seem to matter.Profiling lead to an infinite loop in https://github.com/whyrusleeping/chunker/blob/master/chunker.go#L209-L336 but it's unclear to me what is happening.
if err == io.ErrUnexpectedEOF { err = nil }
is suspicious but that kinda fly over my head.