jupyterlab / jupyterlab-git

A Git extension for JupyterLab
BSD 3-Clause "New" or "Revised" License
1.45k stars 313 forks source link

Git repositories with spaces in their name are not recognized #1211

Open ardislu opened 1 year ago

ardislu commented 1 year ago

Description

The git side panel does not recognize a git repo exists when you open a freshly-cloned repo that has spaces in its name.

For example: a repo with the name repo with spaces in its name is cloned to a folder named repo%20with%20spaces%20in%20its%20name. All the files inside the folder are cloned as expected. However, opening the folder does not trigger the git side panel to recognize any git repo.

Reproduce

  1. Create a new git repo that contains spaces in its name (NOTE: GitHub does not allow this, but other hosts such as Azure DevOps do).
  2. In Jupyter, clone the repo.
  3. Open the folder and try to use the git side panel.

Expected behavior

The repo is cloned successfully and the git side panel works as expected.

Actual behavior

The repo is cloned successfully with the spaces URL encoded in the folder name (i.e. "%20" instead of spaces). However, the git side panel does not detect any git repo inside the folder (it shows the default "You are not currently in a Git repository" page).

Workarounds

Workaround 1: Manually rename the folder to replace the "%20" encoding with spaces. After renaming the folder, the side panel works as expected. Workaround 2: Open a new terminal and manually use the git CLI.

Context

welcome[bot] commented 1 year ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

fcollonval commented 1 year ago

Thanks @ardislu

would you be willing to contribute fixing this? I can provide pointers

ardislu commented 1 year ago

@fcollonval Sure, I gave it a shot.

I believe the issue is that the path ends up unescaped when it gets passed to various functions in git.py.

But in this case, we actually want spaces to remain escaped because that's how the folder name is cloned. So my first pass to get it working was to update each execute call to re-escape the spaces in the directory path.

Either like this:

async def show_prefix(self, path):
    cmd = ["git", "rev-parse", "--show-prefix"]
    code, my_output, my_error = await execute(
        cmd,
-       cwd=path,
+       cwd=quote(path, safe=":/\\"),
    )

Or like this:

async def branch(self, path):
+   path = quote(path, safe=":/\\")
    heads = await self.branch_heads(path)

This solution fixes this issue, but I don't think it's optimal:

Any suggestions? Thank you for your help on this.

ardislu commented 1 year ago

Thought about it some more and realized it'd be much easier and simpler to just move the same quote logic directly into execute. Created #1214 to do that. Confirmed that folders named repo with spaces in its name and repo%20with%20spaces%20in%20its%20name now both work as expected.

Edit: if two folders named repo with spaces in its name and repo%20with%20spaces%20in%20its%20name both exist in the same folder, then my update will cause git commands from either folder to only go to repo with spaces in its name. Note that this is the same behavior as currently (maybe another issue should be raised for this). So I think the root cause is not yet fixed.

I believe the root cause is that this URL:

/repo%2520with%2520spaces%2520in%2520its%2520name

Should only be decoded once to:

/repo%20with%20spaces%20in%20its%20name

But there is some logic which decodes it again, so the actual string passed to the git.py functions is:

/repo with spaces in its name

However, I'm having a hard time finding where/how the URL is getting decoded twice. In handlers.py I see the path is decoded by url2localpath:

local_path = os.path.join(os.path.expanduser(cm.root_dir), url2path(path))

But I can't see where it's getting decoded again before getting passed to git.py. @fcollonval Any ideas?