jupyterlab / jupyterlab-git

A Git extension for JupyterLab
BSD 3-Clause "New" or "Revised" License
1.46k stars 319 forks source link

Support local files served through HybridContentsManager #684

Closed nielsenrechia closed 3 years ago

nielsenrechia commented 4 years ago

Hello Guys!

I tried the Troubleshooting section but I still have a problem.

Description

I'm working with Juyterlab on an AWS EMR. So, we work with s3 bucket persistence! I know that using s3 persistence without local files may not be possible to use the jupyterlab-git extension. Wherever, we are working with hybridcontents that suppose to makes it possible, because with that we can list both AWS S3 files and local files.

Reproduce

To try reproduce you must

  1. sudo yum install -y git
  2. pip install --upgrade jupyterlab-git hybridcontents
  3. jupyter lab build
  4. Change the /etc/jupyter/jupyter_notebook_config.py to something like
# Configuration file for jupyter-notebook.
from s3contents import S3ContentsManager
import os
from hybridcontents import HybridContentsManager
from IPython.html.services.contents.filemanager import FileContentsManager

c = get_config()
user = os.environ['JUPYTERHUB_USER']
c.NotebookApp.contents_manager_class = HybridContentsManager

c.HybridContentsManager.manager_classes = {
    # Associate the root directory with an S3ContentsManager.
    # This manager will receive all requests that don"t fall under any of the
    # other managers.
    "": S3ContentsManager,
    # Associate /directory with a FileContentsManager.
    "local_directory": FileContentsManager,
}

c.HybridContentsManager.manager_kwargs = {
    # Args for root S3ContentsManager.
    "": {
        "bucket": "YOUR-BUCKET-NAME",
        "prefix": user,
    },
    # Args for the FileContentsManager mapped to /directory
    "local_directory": {
        "root_dir": "/home/" + user,
    },
} 

Does not forget to replace YOUR-BUCKET-NAME by an s3 Bucket on AWS and create a folder with the name of your user as the same as JUPYTERHUB_USER

Expected behavior

Without hybrid content the extension works very well, but based on such configuration it is not working as expected.

Context

Results of conda list jupyterlab-git

# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
jupyterlab-git            0.20.0                   pypi_0    pypi

pip show jupyterlab-git

Name: jupyterlab-git
Version: 0.20.0
Summary: A server extension for JupyterLab's git extension
Home-page: https://github.com/jupyterlab/jupyterlab-git
Author: Jupyter Development Team
Author-email: None
License: BSD
Location: /opt/conda/lib/python3.7/site-packages
Requires: notebook, pexpect, nbdime
Required-by:

Results of jupyter labextension list

JupyterLab v2.0.1
Known labextensions:
   app dir: /opt/conda/share/jupyter/lab
        @jupyterlab/git v0.20.0  enabled  OK
        @lckr/jupyterlab_variableinspector v0.5.1  enabled  OK
        nbdime-jupyterlab v2.0.0  enabled  OK

Below you can see two csv files that are on an S3 bucket, as well the local_directory, that contains the local files. image

Maybe I need some extra configuration, so I need your help 🗡️ thanks

telamonian commented 4 years ago

Forget all of the jlab and jlab-git stuff; are you under any circumstances able to interact with the files on the s3 bucket using the plain vanilla git command line tool? Everything in jupyterlab-git is built on top of calling the git command line tool as a subprocess.

So if your files are not visible to git, then this is never going to work

nielsenrechia commented 4 years ago

Hi @telamonian,

Thanks for your reply.

I understand your comment, but I'm trying to use git only with my local files. Moreover, I also need the notebooks on the S3 bucket.

So, as an example, below I show that using the git on terminal everything works, but the jlab not!

First of all, I have git on my environment.

image

Then, I'm able to init ou reinit a git repository inside my local_directory by terminal.

image

But, if I try to init some repository with jlab, there is no error, but the directory remains as not a git repository.

image

Finally, I'm able to clone, add files, commit ... to a repository on my local_directory by the terminal. But with jlab I can't clone and can't find a cloned repository to be possible to use the jlab web interface on it.

image image

Remember that without hybridcontents the jlab works very well, I'm suspecting that is something related to FILESYSTEM or from IPython.html.services.contents.filemanager import FileContentsManager used to set the hybrid environment.

Any advice?

Thanks,

erowan commented 4 years ago

Hello,

I am seeing the same issue. jupterlab-git does not appear to work with the libs that support a local filesystem & s3 (hybridcontents & s3contents).

I am seeing this error even though /dev/gkp-hub is a valid git repo

[D 17:26:04.867 LabApp] 200 GET /api/contents/dev/gkp-hub?content=1&1603214764902 (100.127.23.228) 3.40ms
[D 17:26:04.942 LabApp] Accepting token-authenticated connection from 100.127.23.228
[E 17:26:04.954 LabApp] Uncaught exception POST /git/show_top_level?1603214764980 (100.127.23.228)
    HTTPServerRequest(protocol='http', host='87308-notebook-dev.apps.mt-d1.carl.gkp.net', method='POST', uri='/git/show_top_level?1603214764980', version='HTTP/1.1', remote_ip='100.127.23.228')
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.6/site-packages/tornado/web.py", line 1592, in _execute
        result = yield result
      File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1133, in run
        value = future.result()
      File "/opt/conda/lib/python3.6/site-packages/jupyterlab_git/handlers.py", line 98, in post
        result = await self.git.show_top_level(current_path)
      File "/opt/conda/lib/python3.6/site-packages/jupyterlab_git/git.py", line 579, in show_top_level
        cwd=os.path.join(self.root_dir, current_path),
      File "/opt/conda/lib/python3.6/site-packages/jupyterlab_git/git.py", line 123, in execute
        None, call_subprocess, cmdline, cwd, env
      File "/opt/conda/lib/python3.6/concurrent/futures/thread.py", line 56, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/opt/conda/lib/python3.6/site-packages/jupyterlab_git/git.py", line 92, in call_subprocess
        cmdline, stdout=subprocess.PIPE, stderr=subprocess.PIPE, cwd=cwd, env=env
      File "/opt/conda/lib/python3.6/subprocess.py", line 729, in __init__
        restore_signals, start_new_session)
      File "/opt/conda/lib/python3.6/subprocess.py", line 1364, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    FileNotFoundError: [Errno 2] No such file or directory: '/dev/gkp-hub': '/dev/gkp-hub'
[W 17:26:04.955 LabApp] Unhandled error```

```python
jovyan@container[singleuser-server-solo-9cf8879d5-txkg6]:~/dev/gkp-hub> jupyter labextension list
JupyterLab v2.1.4
Known labextensions:
   app dir: /opt/conda/share/jupyter/lab
        @jupyterlab/git v0.20.0  enabled  OK
        nbdime-jupyterlab v2.0.0  enabled  OK

s3contents           0.5.1              
s3fs                      0.4.2  
hybridcontents    0.3.0 
xjantoth commented 4 years ago

Hey there,

I am trying to save data from jupyter lab to AWS S3 in Kubenretes.

Once jupyter lab comes up I can see data from S3 bucket "testml1" perfect.

However if I open a terminal in jupyter notebook and create a file this is NOT

    extraConfig: |-
      c.ServerProxy.host_whitelist = ["localhost", "127.0.0.1", "rapidsai-scheduler"]
      # ----------------
      from s3contents import S3ContentsManager

      # c = get_config()

      # Tell Jupyter to use S3ContentsManager for all storage.
      #c.NotebookApp.contents_manager_class = S3ContentsManager
      #c.S3ContentsManager.access_key_id = "..."
      #c.S3ContentsManager.secret_access_key = "..."
      #c.S3ContentsManager.bucket = "testml1" 
      # ------
      from hybridcontents import HybridContentsManager
      from notebook.services.contents.largefilemanager import LargeFileManager
      c = get_config()

      c.NotebookApp.contents_manager_class = HybridContentsManager

      c.HybridContentsManager.manager_classes = {
          # Associate the root directory with an S3ContentsManager.
          # This manager will receive all requests that don"t fall under any of the
          # other managers.
          "": S3ContentsManager,
          # Associate /directory with a LargeFileManager.
      }

      c.HybridContentsManager.manager_kwargs = {
          # Args for root S3ContentsManager.
          "": {
              "access_key_id": "...",
              "secret_access_key": "...",
              "bucket": "testml1",
              # "root_dir": "/home/jovyan",
          },
      } 

it almost feels like terminal and the storage it is using, is totally separated from the one which is used by HybridContentsManager. and S3ContentsManager