jelmer / dulwich

Pure-Python Git implementation
https://www.dulwich.io/
Other
2.06k stars 395 forks source link

Cloning from locally served repo throws TypeError #973

Open lukjak opened 2 years ago

lukjak commented 2 years ago

Hello,

I create local repo, serve it with TCPGitServer and then want to clone using p.clone (I am testing my repo-handling code and want to do it with a "real" server).

The code sample below throws "TypeError: Can't mix strings and bytes in path components".

I know why ('some_repo" is passed as bytes to TCPGitServer, which tries to concatenate it with str root path coming from FileSystemBackend) - but I have not idea how to resolve this problem.

Any suggestions? TIA

import logging
import threading
from pathlib import Path

from dulwich import porcelain as p
from dulwich.server import FileSystemBackend, TCPGitServer

logging.basicConfig(level=logging.DEBUG)

BASE = Path("c:/temp/dulwich")
REMOTE_PATH = BASE / "remote"
LOCAL_PATH = BASE / "local"

Path(REMOTE_PATH).mkdir(parents=True, exist_ok=True)
Path(LOCAL_PATH).mkdir(parents=True, exist_ok=True)

try:
    p.init(REMOTE_PATH)

    backend = FileSystemBackend(REMOTE_PATH)
    dul_server = TCPGitServer(backend, b'localhost', 0)
    threading.Thread(target=dul_server.serve).start()
    server_address, server_port = dul_server.socket.getsockname()

    p.clone(f"git://{server_address}:{server_port}/some_repo", LOCAL_PATH)
finally:
    dul_server.shutdown()
jelmer commented 2 years ago

This is probably a windows-specific issue. Have you tried encoding the path you pass into FileSystemBackend()?

lukjak commented 2 years ago

Yes, both FileSystemBackend() and clone() parameters require str. Otherwise exceptions are thrown due to attempts to use bytes in context of string.

This is for REMOTE_PATH as bytes:

Traceback (most recent call last):
  File "C:/Users/user/AppData/Roaming/JetBrains/PyCharmCE2022.1/scratches/scratch.py", line 20, in <module>
    backend = FileSystemBackend(str(REMOTE_PATH).encode())
  File "C:\Programy\Python3\lib\site-packages\dulwich\server.py", line 203, in __init__
    self.root = (os.path.abspath(root) + os.sep).replace(os.sep * 2, os.sep)
TypeError: can't concat str to bytes

This is for clone source repo URI as bytes:

INFO:dulwich.server:Listening for TCP connections on b'localhost':0
Traceback (most recent call last):
  File "C:/Users/user/AppData/Roaming/JetBrains/PyCharmCE2022.1/scratches/scratch.py", line 25, in <module>
    p.clone(f"git://{server_address}:{server_port}/some_repo".encode(), LOCAL_PATH)
  File "C:\Programy\Python3\lib\site-packages\dulwich\porcelain.py", line 441, in clone
    (client, path) = get_transport_and_path(source, **kwargs)
  File "C:\Programy\Python3\lib\site-packages\dulwich\client.py", line 2347, in get_transport_and_path
    if sys.platform == "win32" and location[0].isalpha() and location[1:3] == ":\\":
AttributeError: 'int' object has no attribute 'isalpha'
lukjak commented 2 years ago

I could easily patch FileSystemBackend.open_repository() to work around this, but I don't have any idea if it would make sense or just mask the real problem which later pops up somewhere else.

Bonus, if you apply such fix and make the example more "realistic" by removing "/some_repo", you get:

 File "C:\Programy\Python3\lib\site-packages\dulwich\refs.py", line 1168, in _set_default_branch
    return head_ref
UnboundLocalError: local variable 'head_ref' referenced before assignment

If you remove just "some_repo", you get:

  File "C:\Programy\Python3\lib\site-packages\dulwich\server.py", line 213, in open_repository
    raise NotGitRepository("Path %r not inside root %r" % (path, self.root))
dulwich.errors.NotGitRepository: Path '/' not inside root 'c:\\temp\\dulwich\\remote\\'
lukjak commented 2 years ago

OK, so finally I came up with:

def open_repository(self, path):
    logger.debug("opening repository at %s", path)
    new_path = Path(path.decode() if isinstance(path, bytes) else path)
    # Remove any anchor if it exists
    new_path = Path(*new_path.parts[1:]) if new_path.anchor else new_path
    # Get full path
    abspath = (self.root / new_path).resolve()
    # Check if we traversed out of root or ended up in a wrong place
    if not abspath.is_relative_to(self.root) or not abspath.is_dir():
        raise NotGitRepository("Path %r not inside root %r" % (path, self.root))
    return Repo(str(abspath))

and using it with

with patch.object(sys.modules["dulwich"].server.FileSystemBackend, "open_repository", open_repository):

BTW, yet another problem with my original sample after the fix:

  File "C:\Programy\Python3\lib\site-packages\dulwich\refs.py", line 1168, in _set_default_branch
    return head_ref
UnboundLocalError: local variable 'head_ref' referenced before assignment

because empty repo (just after init()) does not contain any refs.

jelmer commented 2 years ago

The traceback you pasted

Hello,

I create local repo, serve it with TCPGitServer and then want to clone using p.clone (I am testing my repo-handling code and want to do it with a "real" server).

The code sample below throws "TypeError: Can't mix strings and bytes in path components".

I know why ('some_repo" is passed as bytes to TCPGitServer, which tries to concatenate it with str root path coming from FileSystemBackend) - but I have not idea how to resolve this problem.

Any suggestions? TIA

import logging
import threading
from pathlib import Path

from dulwich import porcelain as p
from dulwich.server import FileSystemBackend, TCPGitServer

logging.basicConfig(level=logging.DEBUG)

BASE = Path("c:/temp/dulwich")
REMOTE_PATH = BASE / "remote"
LOCAL_PATH = BASE / "local"

Path(REMOTE_PATH).mkdir(parents=True, exist_ok=True)
Path(LOCAL_PATH).mkdir(parents=True, exist_ok=True)

try:
    p.init(REMOTE_PATH)

    backend = FileSystemBackend(REMOTE_PATH)
    dul_server = TCPGitServer(backend, b'localhost', 0)
    threading.Thread(target=dul_server.serve).start()
    server_address, server_port = dul_server.socket.getsockname()

    p.clone(f"git://{server_address}:{server_port}/some_repo", LOCAL_PATH)
finally:
    dul_server.shutdown()

What's the backtrace you get when you use this code (i.e. don't encode the path as bytes)?

lukjak commented 2 years ago

The trace with dulwich 0.20.46 for the above code is:

Traceback (most recent call last):
  File "C:\Programy\Python3\lib\socketserver.py", line 316, in _handle_request_noblock
    self.process_request(request, client_address)
  File "C:\Programy\Python3\lib\socketserver.py", line 347, in process_request
    self.finish_request(request, client_address)
  File "C:\Programy\Python3\lib\socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "C:\Users\Łukasz\PycharmProjects\untitled1\.venv\lib\site-packages\dulwich\server.py", line 1165, in _make_handler
    return TCPGitRequestHandler(self.handlers, *args, **kwargs)
  File "C:\Users\Łukasz\PycharmProjects\untitled1\.venv\lib\site-packages\dulwich\server.py", line 1145, in __init__
    socketserver.StreamRequestHandler.__init__(self, *args, **kwargs)
  File "C:\Programy\Python3\lib\socketserver.py", line 747, in __init__
    self.handle()
  File "C:\Users\Łukasz\PycharmProjects\untitled1\.venv\lib\site-packages\dulwich\server.py", line 1155, in handle
    h = cls(self.server.backend, args, proto)
  File "C:\Users\Łukasz\PycharmProjects\untitled1\.venv\lib\site-packages\dulwich\server.py", line 296, in __init__
    self.repo = backend.open_repository(args[0])
  File "C:\Users\Łukasz\PycharmProjects\untitled1\.venv\lib\site-packages\dulwich\server.py", line 207, in open_repository
    abspath = os.path.abspath(os.path.join(self.root, path)) + os.sep
  File "C:\Programy\Python3\lib\ntpath.py", line 117, in join
    genericpath._check_arg_types('join', path, *paths)
  File "C:\Programy\Python3\lib\genericpath.py", line 155, in _check_arg_types
    raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components