isomorphic-git / isomorphic-git

A pure JavaScript implementation of git for node and browsers!
https://isomorphic-git.org
MIT License
7.49k stars 378 forks source link

Properly propagate error encountered when reading response from git-upload-pack POST request #1847

Closed mojavelinux closed 10 months ago

mojavelinux commented 10 months ago

When performing a fetch operation on a repository (which occurs in any of following commands: clone, fetch, pull), isomorphic-git (as a compliant git client) will send a POST request to the git-upload-pack service to retrieve updates in the form of packetlines (new references) and the packfile prepared by the server. A packfile is a payload that contains updates to the repository index (or the whole repository in the case of a clone). This response also includes progress summaries.

If an error occurs while reading the git-upload-pack response, instead of throwing that error, isomorphic-git swallows it, logs it to STDOUT, and presses on. In doing so, it either runs into an obtuse logic error (such as Cannot read properties of undefined (reading 'length')) or it writes an incomplete packfile, which manifests as an error later when reading objects in the repository.

This issue was originally reported as #1846. However, that was just the tip of the iceburg. isomorphic-git was trying to process an incomplete response (though it should still handle that situation gracefully).

The first question is, when does this happen? Well, a network connection is not a guaranteed resource. It can fail at any time for any reason. isomorphic-git makes the assumption that once it receives the 200 status code, it will be able to read the response to completion. But this is simply not a valid assumption.

In fact, we have received numerous reports of the Cannot read properties of undefined (reading 'length') over the years that are likely an indicator that the underlying error is being suppressed and ignored. The real reason is probably logged to STDOUT somewhere (if available), but that's very untidy for consumers of isomorphic-git since the error cannot be handled or redirected to a log.

But there's a more insidious problem afoot. GitHub recently changed its server settings so that any part of a response is not read for at least 5 seconds (or less if the client has opened multiple connections), it will reset the connection abruptly. This pause can happen if the isomorphic-git consumer runs concurrent fetch operations and at least of the responses is large. The pause is Node.js context switching between concurrent operations. If GitHub resets the connection when paused, Node.js will encounter an ECONNRESET error when resuming the stream. When this happens with the native git client, users will see the following error:

fatal: unable to access 'http://localhost:8888/test-clone.git': transfer closed with outstanding read data remaining

Whether it's GitHub resetting the connection, or some other network problem, such as a client timeout, isomorphic-git has to stop processing the response when an error occurs and rethrow it so it can be handled.

I've found numerous places where isomorphic-git will either hang or throw an obtuse logic error if it encounters an error when reading some part of the response stream of a fetch operation:

When an error occurs when reading the stream, isomorphic-git should save that error to the error property on the stream instead of writing it to STDOUT (console.log).* (In GitSideBand.demux, it will also record the error on the packfile FIFO). The fetch command will then need to check for this property. It will need to do so in several places since isomorphic-git reads the response incrementally and may succeed with several steps (such as reading the packetlines) before hitting it.

I think we can address all of these problems in a single PR with multiple commits since they are all interconnected. I will submit that PR. (I have done rather extensive research into this problem and I'm confident that I know how to properly fix it).

(*) It's tempting to want to add an error listener on the response stream. Unfortunately, the http plugin in isomorphic-git does not require the value of the body object to honor the EventEmitter interface and therefore we cannot rely on it.

mojavelinux commented 10 months ago

isomorphic-git should save that error to the error property on the stream

I choose error since it's used elsewhere in isomorphic-git for this same purpose and it doesn't conflict with the errored property introduced on streams in Node.js 18. If we could assume we were using Node.js 18, we may have been able to just check for the errored property (and perhaps it could be checked for redundancy), but we can't rely on that since the min version of Node.js supported is 12...and there is also browser compatibility to worry about. It's also the case that the stream may just be an Iterator which won't have the errored property anyway (not a true stream).

mojavelinux commented 10 months ago

PR sent.

isomorphic-git-bot commented 10 months ago

:tada: This issue has been resolved in version 1.25.2 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

mojavelinux commented 10 months ago

The way I tested this is to register a custom http plugin for isomorphic-git that intercepts the response to emulate what the server is sending. As describe above, the server starts to send response body lines, but then resets the connection, which manifests as an ECONNRESET error.

The lines in a git-upload-pack response are delimited by length. The 4 four characters are the length of the line to read (including the trailing newline) in hexadecimal. The rest of the line is the data. In some cases, the first character of the data is a code to indicate whether the line is a packetline, a packfile, or an error. You can find an example in the following test:

https://github.com/isomorphic-git/isomorphic-git/blob/ad9bc11e2df311da540330d2243cb3ba9dd5d47a/__tests__/test-clone.js#L282-L341