Support integration with credentials managers, that works also on JupyterHub-deployed JupyterLab instances

fperez commented 3 years ago

Is your feature request related to a problem? Please describe.

The issue of easier integration with git credentials management has already been discussed at least in #299 and #348. While the solution of using a local manager is viable for local installations, it's much trickier in remotely hosted Hubs, that are typically Linux instances in the cloud (or similar). In those cases, more advanced users can certainly set up ssh-agent in a terminal, which works (that's what I do), but they still lose the ability to push from this GUI.

Describe the solution you'd like

While doing full-on credentials management is certainly outside the scope of this extension, it may be possible to hook into:

Git Credential Manager Core for https clones.
The regular SSH agent for ssh clones.

Describe alternatives you've considered

For now I'm setting up my SSH agent in a terminal, and using the terminal for push/pulls while taking advantage of the GUI for other operations, and I can also explain that to my students. But in the long run I'd prefer to have the GUI integrate with the credentials management options directly.

Thanks for the great work, this extension is an excellent tool for many users with less experience at the command line!

welcome[bot] commented 3 years ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

fcollonval commented 3 years ago

Thanks @fperez for the proposal (and sharing the credential manager core tool).

Did you try to deploy Git Credential Manager Core on the hub infrastructure? It should be possible to add it to the user server image and set it up for them. And this extension should already prompts the user for credentials if needed and then use that manager (although two factors authentication is another story).

I am a bit reluctant of shipping a credentials solution with this extension especially because in deployed infrastructure, the requirements may differ (for instance you may want to be able to persist the credentials). Although the mentionned Git Credential Manager Core has the nice advantage to be a single cross-platform (what is a first outside the default cache and file storage manager as far as I know).

For ssh, the story is a bit different. And it will be indeed nice to handle properly ssh-agent credential request.

fperez commented 3 years ago

No, I haven't tested yet deploying GCM inside our hub. My workflow is much more ssh- than https-centric, so for now (given limited time) I'll focus on that aspect, but at least I wanted to drop the link about GCM... We might give it a try just in case, if we do and learn anything useful I'll report back for sure.

For ssh it would be great if the extension could make the eval "$(ssh-agent -s) call once an ssh-based repo is detected, and then make the ssh-add <path to key file> call in the UI. I'm not sure on all the specifics of how to keep the agent credentials live in the extension as I've never looked at that part of ssh outside a terminal, so I wouldn't know where to start at this moment. But it sounds completely doable, and would be a wonderful improvement.

I'd be happy teaching my students and collaborators how to do the one-time setup of ssh support, if after that the rest of the experience worked well through the GUI, which does 90% of the git job for many of them.

dhirschfeld commented 3 years ago

In case it's useful, I set up the cache credential helper in my JupyterLab container:

git config --system credential.helper 'cache --timeout=36000 --socket /tmp/.git-credential-cache/socket'

...not sure if the extension would make use of that though - I still do all my pushing & pulling from a terminal.

  singleuser:
    lifecycleHooks:
      postStart:
        exec:
          command:
            - "bash"
            - "-c"
            - >
              mkdir /tmp/.git-credential-cache
              && chmod 0700 /tmp/.git-credential-cache

fperez commented 3 years ago

Thanks @dhirschfeld! To clarify - what's the setup in your case? Locally running or in the cloud/remote system? And if local, are you starting jupyterlab straight from a terminal, or is it inside a container? The use case I have in mind is cloud-hosted instances of JupyterLab started by a non-user-controllable JupyterHub, with kubernetes deployment of containers (such that the Unix user is always jovyan).

For my use case, setting up the ssh agent at a terminal works OK, it's just that it would be nice to have that connected to this extension.

dhirschfeld commented 3 years ago

I'm running JupyterHub on Azure Kubernetes Service using the zero-to-jupyterhub-k8s helm chart so that's the same use-case I think.

The pasted yaml was the config I needed to get the credential cache to actually work after configuring git to use it.

With this setup the user only needs to enter their username/password once in a new container (or after it expires). I have jupyterlab-git pre-installed too but haven't actually tested if the credential cache works with it - I suppose it should.

fperez commented 3 years ago

Thanks @dhirschfeld, that's very interesting. Two quick questions:

What file does the above yaml go into? I'm not very familiar with k8/helm charts, sorry for the naive question...
Are you using the Git Credential Manager linked above, so that this setup applies to HTTPS clones? Or is it an SSH solution wrapping around ssh-agent?

dhirschfeld commented 3 years ago

What file does the above yaml go into?

With helm you provide a values.yaml file to override the defaults in: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/bf65b102e6b7192840e31670aee58be6921b2aa0/jupyterhub/values.yaml#L349

IIUC this doesn't use GCM but instead used the built-in cache which simply caches the credentials in memory. It's not perfect as users still have to enter their credentials when the container starts (or when the credentials expire) but it's a lot better than having to enter them every single time. It does work for HTTPS (as that's what I use myself)

fperez commented 3 years ago

Excellent, thanks! It seems that documenting a combo of this + a clean ssh-agent setup (for those like me who like to use the SSH transport) is probably a good start that will cover many use cases, and we can then keep exploring how to integrate both of those solutions into the extension this repo is about (or update your HTTPS one with the GCM as needed).

Thank you for taking the time to explain your setup!

My intent with all this is to get us to a place where working on our hubs feels as natural and flexible as working on a local desktop - if the pain points are too many, it's death by a thousand paper cuts and we'll abandon it. But I think we have all the pieces to make it possible, and without relying on any specific cloud/proprietary vendor.

cc @consideRatio @choldgraf @yuvipanda @rabernat - this is in line with our recent discussions...

fcollonval commented 3 years ago

Thanks a lot for sharing your configuration @dhirschfeld

not sure if the extension would make use of that though

It will use the cache helper as the extension is running all actions by calling git commands in subprocesses.

consideRatio commented 3 years ago

Ah, perhaps this is a pattern to make slightly easier in projects like https://github.com/jupyter/docker-stacks and https://github.com/pangeo-data/pangeo-docker-images? We could create a folder in /tmp ahead of time, and document that one can configure the following...

git config --system credential.helper 'cache --timeout=36000 --socket /tmp/.git-credential-cache/socket'

Would it be problematic if that was configured for 3600 by default in jupyter/docker-stacks for example? It feels like a potential security concern to do, so probably best not touching.

I'm thinkin that in jupyter/docker-stacks for example, we can add documentation on doing this that relies on the /tmp/.git-credential-cache folder being setup with suitable permissions.

fperez commented 3 years ago

@consideRatio that sounds like a great idea, the more streamlined/standard these patterns become, the better (and I hope we can support both SSH- and HTTPS-based workflows, I think they are both legitimate, unless I'm missing something).

I'm curious as to whether there may be any potential issues with this being located in the global tmp with a fixed name, but I'll defer to you all on those details.

Glad to see this discussion - it seems we can make some improvements to the hosted experience!

fcollonval commented 3 years ago

I'm curious as to whether there may be any potential issues with this being located in the global tmp with a fixed name, but I'll defer to you all on those details.

This is a good question. @dhirschfeld what was the reason behind using the /tmp folder instead of using the default location (aka ~/.git-credential-cache/ if it exists otherwise $XDG_CACHE_HOME/git/credential/socket).

@consideRatio is git available by default in the image? It could indeed make sense to set the cache helper by default. And definitely documenting it would be great.

dhirschfeld commented 3 years ago

what was the reason behind using the /tmp folder instead of using the default location

My setup may not be optimal - it was just what I could get working quickly. IIRC I used /tmp due to permission errors. My ~/ is a PVC backed by an Azure Files SMB share. I don't think I tried very hard to get it working before resorting to /tmp. Details are a bit hazy, but the point is, there may well be better ways of achieving the same thing - particularly since I'm fairly clueless about linux.

So, whilst the concept (using the git credential cache) might be useful, take the rest as just my amateur hacks to get it working and not as any statement of this is the way you should do it.

fcollonval commented 3 years ago

Thanks a lot for the detailed answer @dhirschfeld

consideRatio commented 3 years ago

For this github repository, what are the relevant action points to consider?

I suggest, to close this issue, let's:

[ ] Document that https://git-scm.com/docs/git-credential-cache can be useful. I think it could be reasonable to have a "Git settings" topic where this is described in the README.md next to other types of settings.

yuvipanda commented 3 years ago

(haven't read through the whole issue yet...)

One major concern with use in JupyterHubs is the 'admin access' feature. This allows anyone with admin access on the hub to basically 'access your server' at any time, even when you are not using it. After that, they can 'access server' to become you, dump memory of the ssh-agent and extract private key (https://www.netspi.com/blog/technical/network-penetration-testing/stealing-unencrypted-ssh-agent-keys-from-memory/). They can also just as easily drop a line in your .bashrc that makes ssh-github an alias that automatically dumps the unencrypted private key and ships it off somewhere.

I think this is a major risk factor for authenticated pushes.

fperez commented 3 years ago

Thanks @yuvipanda! The admin feature worried me while the key was live in the agent, but I figured that could be manageable with some caveats. But I hadn't thought of the 2nd attack vector, which in some sense is worse, as the trap can be set and go off for the unsuspecting user the first time they use ssh-agent.

Are there similar attack options with the HTTPS alternatives, using either the built-in helper or GCM?

yuvipanda commented 3 years ago

Are there similar attack options with the HTTPS alternatives, using either the built-in helper or GCM?

Yeah, the bashrc type attack can execute arbitrary code so nothing is immune.

fperez commented 3 years ago

Fyundamentally the problem with the admin feature is that it's a quasi free-for-all "tons of people can be root" with minimal safeguards. On hosted systems, we typically think of root as a tiny set of operators who follow very strict protocols- I know root at an HPC center can do anything with my account, but I also trust that they don't very strongly...

We really need to be able to offer much more fine-grained access controls, so that the (legitimate) user-help scenarios where the "let me become you" usage that we take advantage of all the time can be used, while not exposing these generic gaping holes...

Tools like TeamViewer offer similar (and equally dangerous) blanket remote control, but you always know when you're using it. We should probably look for a similar model, where the users need to activate the remote access authorization, and ideally the UI changes while that is turned on (big red border kind of thing)...

Has this been discussed in more depth on the JupyterHub channels in general?

yuvipanda commented 3 years ago

Fyundamentally the problem with the admin feature is that it's a quasi free-for-all "tons of people can be root" with minimal safeguards.

Yep, this is absolutely the problem.

Has this been discussed in more depth on the JupyterHub channels in general?

I'm not sure, nowhere that I am aware of. But given that we allow arbitrary code execution, i think super fine grained controls once you grant 'access server' are going to be difficult

fperez commented 3 years ago

I guess I meant fine-grained in the sense of when/how that access is granted, not what it offers, sorry for the lack of precision. Right now that access is persistent and global. I think the problems would be mitigated by a TeamViewer-like approach, where you know when you open your system to someone else that it's open, and you can close that door any time.

There's the added complication that in T.V. what's shared is a common view of the desktop, so it's harder to have surreptitious actions taken on your machine by someone you trusted. In our case, even with a T.V-like approach, you still wouldn't be able to see what they are doing as they could open a new terminal and do stuff in it... But at least it would be a big difference if you knew that nobody can access your server without you first creating this otherwise temporary/evanescent opening...

Just thinking out loud for now...

yuvipanda commented 2 years ago

https://blog.jupyter.org/securely-pushing-to-github-from-a-jupyterhub-3ee42dfdc54f does this now, and I think @fperez uses it fairly often?

fperez commented 2 years ago

Correct - I consider that tool now a non-negotiable, necessary element of a deployed hub, so working with github can be smooth and painless. Thanks a lot @yuvipanda for the development!

dhirschfeld commented 2 years ago

Just noting that the git-credential-manager is now available on conda-forge:

https://github.com/conda-forge/git-credential-manager-feedstock

...for linux at least (I have plans to add at least Windows as well).

I use the below config to cache the credentials in memory:

git config --global credential.helper manager-core
git config --global credential.credentialstore cache
git config --global credential.cacheoptions '--timeout=36000 --socket /tmp/.git-credential-cache/socket'

fcollonval commented 2 years ago

Thanks a lot for sharing the information @dhirschfeld

joalmjoalm commented 10 months ago

Just noting that the git-credential-manager is now available on conda-forge:

conda-forge/git-credential-manager-feedstock

...for linux at least (I have plans to add at least Windows as well).

I use the below config to cache the credentials in memory:
git config --global credential.helper manager-core
git config --global credential.credentialstore cache
git config --global credential.cacheoptions '--timeout=36000 --socket /tmp/.git-credential-cache/socket'

Can GCM be used as a helper also with other credentialstores or only with cache together with jupyter-git? I installed jupyter-git and GCM, configured GCM as helper but the jupyter-git still continue to ask for credentials. Is there any detailed description of how to combine GCM and jupyter-git using the other crendtialstores available under GCM?

@dhirschfeld

jupyterlab / jupyterlab-git