Open moyamo opened 2 years ago
@zenhack Do you have any suggestions on this one?
Is the code for this publicly available?
Is there anything of interest in sandstorm.log? Because of the way the HTTP proxying over capnp works, sometimes information about actual errors gets obscured by the time the grain sees a response, but errors in logs can be illuminating. (It is possible that GitLab is not giving you a 500, but somewhere inside sandstorm some error is occurring).
Is the code for this publicly available?
No, it's very WIP at the moment.
Is there anything of interest in sandstorm.log?
I checked the grains log and I was just seeing 500 Internal Error
that looked like it was probably from Gitlab. I didn't think to check the sandstorm log. I'll go digging later and report back.
I've managed to get git clone
to work though the IpNetwork
interface. I didn't bother trying to use the ApiSession
interface. I think that means the problem is either in the sandstorm-http-bridge
or the ApiSession
.
Using the IpNetwork
, of course, is less than ideal, since only the server admin can grant access to it.
So I know the main use case for the ApiSession
/sandstorm-http-bridge
is to access REST APIs.
I guess my question is then: Is git clone
over HTTPS something that ApiSession
supports? Or is it too weird of a use case?
We have multiple apps which use Git repos, none of which use IpNetwork, so you should be okay there. You may want to look at our GitWeb package. (We have GitLab too, but it is much older, I believe.)
@ocdtrekkie, I believe @moyamo is trying to use a git client from inside a grain, rather than serve a git repo from a grain -- so we don't actually afaik have existing apps that do this.
I would hazard a guess the issue is either in our implementation of ApiSession (it seems entirely possible that git clone does something our current implementation isn't handling correctly), or in the way you're trying to use it in the app code.
Do you have an objection to publishing the code (or some simplified version that exhibits the problem)? I feel like it would save a lot of time for me to just be able to see what you're doing.
So I checked the sandstorm logs and it didn't log any errors so I'm 90% sure that the 500 error is coming from Gitlab and not from the sandstorm-http-bridge
.
Here's some snippets of the code:
First we create a token that we can claim to access Gitlab.
@0x9759ad011d40ab4c; # generated using `capnp id`
using Powerbox = import "/sandstorm/powerbox.capnp";
using ApiSession = import "/sandstorm/api-session.capnp".ApiSession;
const myTagValue :ApiSession.PowerboxTag = (
canonicalUrl = "https://gitlab.com/api/v4",
authentication = "basic",
);
const myDescriptor :Powerbox.PowerboxDescriptor = (
tags = [
(id = 0xc879e379c625cdc7, value = .myTagValue)
],
);
NOTE we set authentication = "basic"
since that's how Gitlab wants us to do auth.
We do the thingy described in the docs to get the base64-encoded version of this constant.
descriptor = "EA5QAQEAABEBF1EEAQH/x80lxnnjecgAQAMRCdIAABERMv9odHRwczovLwJnaXRsYWIuY29tL2FwaS92ATQfYmFzaWM="
Then we use a jinja2 template to put the token in the HTML we do the thingy in the docs again to get the claim token.
<!doctype HTML>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body>
<button onclick="connectGitlab()">Connect to Gitlab</button>
<button onclick="clone()">Clone</button>
<script>
function connectGitlab() {
window.parent.postMessage({
powerboxRequest: {
rpcId: 1,
query: [
"{{descriptor}}"
],
saveLabel: {defaultText: "gitlab API access"},
}
}, "*");
}
window.addEventListener("message", function (event) {
if (event.source !== window.parent) {
// SECURITY: ignore postMessages that didn't come from the parent frame.
return;
}
var response = event.data;
if (response.rpcId !== 1) {
// Ignore RPC ID that dosen't match our request. (In real code you'd
// probably have a table of outstanding RPCs so that you don't have to
// register a separate handler for each one.)
return;
}
if (response.error) {
// Oops, something went wrong.
alert(response.error);
return;
}
if (response.canceled) {
// The user closed the Powerbox without making a selection.
return;
}
// We now have a claim token. We need to send this to our server
// where we can exchange it for access to the remote API!
doClaimToken(response.token);
});
async function doClaimToken(token) {
r = await fetch("/token", {method: "POST", body: token})
}
async function clone() {
r = await fetch("/clone", {method: "POST"})
}
</script>
</body>
</html>
Then we claim the token and store it in the file /var/bearer.txt
for later use.
@app.route("/token", methods=["POST"])
def token():
tok = request.data.decode('utf-8')
session_id = request.headers.get("X-Sandstorm-Session-Id")
r = requests.post(f"http://http-bridge/session/{session_id}/claim",
headers={"Content-Type": "application/json"},
json={"requestToken": tok, "requiredPermissions": ["read"]}
)
gitlab_cap = r.json()['cap']
with open('/var/bearer.txt', 'w') as f:
f.write(gitlab_cap)
return ''
Now comes the interesting part. Unfortunately git
doesn't allow you to use Bearer
authentication so we use mitmproxy to add the Bearer token in between git and the sandstorm-http-bridge
.
HOME=/var/ mitmdump --mode upstream:$HTTP_PROXY -s /opt/app/gitproxy.py &
NOTE --mode upstream:$HTTP_PROXY
tells mitmdump
to "pass-on" the connection to sandstorm-http-bridge
.
-s /opt/app/gitproxy.py
tells it to run this simple plugin which adds the Bearer token (which we get from /var/bearer.txt
where we stored it earlier).
#!/usr/env/bin python
def request(flow):
try:
with open("/var/bearer.txt") as f:
bear = f.read().strip()
flow.request.headers["Authorization"] = "Bearer " + bear
except Exception:
pass
Now we try to git clone the repository
@app.route("/clone", methods=["POST"])
def go():
os.chdir('/var')
proxy = "http://localhost:8080" # This is mitmdump
subprocess.run(['rm', '-r', 'myrepo'])
# Try with --depth=1 (succeeds for some reason)
subprocess.run(["git", "clone", "http://http-proxy/moyamo/myrepo.git", "--depth=1"], env={"http_proxy": proxy, "HTTP_PROXY": proxy})
subprocess.run(['ls', '-l']) # Verify repo is cloned
subprocess.run(['rm', '-r', 'myrepo'])
# Try again without --depth=1 (fails for some reason)
subprocess.run(["git", "clone", "http://http-proxy/moyamo/myrepo.git"], env={"http_proxy": proxy, "HTTP_PROXY": proxy})
subprocess.run(['ls', '-l']) # Show repo is not cloned
return ''
The output in the grains logs is
rm: cannot remove 'myrepo': No such file or directory
Cloning into 'myrepo'...
127.0.0.1:40290: clientconnect
127.0.0.1:40290: GET http://http-proxy/moyamo/myrepo.git/inf…
<< 200 OK 17.59k
127.0.0.1:40290: POST http://http-proxy/moyamo/myrepo.git/git…
<< 200 OK 56b
127.0.0.1:40290: POST http://http-proxy/moyamo/myrepo.git/git…
<< 200 OK 3.3m
127.0.0.1:40290: clientdisconnect
total 28
-rw-rw---- 1 723 463 60 Jul 19 18:41 bearer.txt
drwxrwx--- 9 723 463 4096 Jul 19 19:41 myrepo
drwxrwx--- 5 723 463 4096 Jul 19 18:15 lib
drwxrwx--- 4 723 463 4096 Jul 19 18:15 log
drwxrwx--- 3 723 463 4096 Jul 19 19:41 run
-rw-rw---- 1 723 463 50 Jul 19 18:43 thing.txt
drwxrwx--- 2 723 463 4096 Jul 19 19:41 tmp
Cloning into 'myrepo'...
127.0.0.1:40292: clientconnect
127.0.0.1:40292: GET http://http-proxy/moyamo/myrepo.git/inf…
<< 200 OK 17.59k
127.0.0.1:40292: POST http://http-proxy/moyamo/myrepo.git/git…
<< 500 Internal Server Error 0b
error: RPC failed; HTTP 500 curl 22 The requested URL returned error: 500
fatal: the remote end hung up unexpectedly
127.0.0.1:40292: clientdisconnect
total 24
-rw-rw---- 1 723 463 60 Jul 19 18:41 bearer.txt
drwxrwx--- 5 723 463 4096 Jul 19 18:15 lib
drwxrwx--- 4 723 463 4096 Jul 19 18:15 log
drwxrwx--- 3 723 463 4096 Jul 19 19:41 run
-rw-rw---- 1 723 463 50 Jul 19 18:43 thing.txt
drwxrwx--- 2 723 463 4096 Jul 19 19:41 tmp
[pid: 14|app: 0|req: 2/2] 127.0.0.1 () {62 vars in 1180 bytes} [Tue Jul 19 19:41:33 2022] POST /clone => generated 0 bytes in 5426 msecs (HTTP/1.1 200) 2 headers in 78 bytes (1 switches on core 0
Given that the git clone --depth=1
succeeded. I'm pretty sure that I did the authentication properly.
Yeah, if it works with --depth=1
then it's probably not an auth problem. Best guess is that git is hitting some edge case that our implementation of ApiSession
doesn't handle correctly. I'll have to remind myself how git clone
actually does stuff at the protocol level and see if I can't figure out what we're missing (I knew the details at one point, but it's been a while). Hopefully I will have time to investigate soonish.
So if I set --depth=260
it actually gives an error instead of just failing silently
capnp/rpc.c++:160: info: returning failure over rpc; exception = capnp/arena.c++:153: failed: Exceeded message traversal limit. See capnp::ReaderOptions.
If I set --depth=259
it succeeds with Gitlab giving a response of 31.99MB. So it's clearly tapping out at a 32MB response. Weird that IpNetwork
doesn't have the same limitation.
On the other hand maybe this is a different error. Maybe when I do a git clone
Gitlab gives me a 500 Internal Server Error
without sending large amounts of data, but when I do git clone --depth=260
, Gitlab is fine, but then capnp gives in.
So it's clearly tapping out at a 32MB response
Hm, I know we've had problems with larger requests in the other direction, but I can never remember if those got solved (@ocdtrekkie, ring any bells?)
Only thing I know of with large transfers was the whole range request thing, which is still an open PR, and I am turned around enough to not know which way that was or which way this is, which may not be much help to anyone.
I'm trying to write an app that
git clone
from a private Gitlab repo into a grain. I've used the powerbox to pass the URL of the repo and the Personal Access Token to my grain. If Igit clone --depth=1
, everything works fine, so I've setup everything correctly.However if I
git clone
without the--depth
flag I get an500 Internal Server Error
from Gitlab. The 500 error is in response to git doing aPOST <repo>/git-upload-pack
.Is there anything weird that the sandstorm-http-bridge could be doing that could cause git clone to fail?
P.S. I've written an http proxy that sits between git and sandstorm-http-bridge to add the
Authorization: Bearer <token>
but otherwise passes the requests verbatim to the sandstorm-http-bridge.