Closed Youw closed 7 months ago
Thank you for your contribution! I've merged it and will add tests for it.
Apparently, I've used some features from python 3.9+ (e.g. removeprefix
), which breaks runtime on Ubuntu20.04 with python 3.8.
Working on a fix.
I've added now a unit test and functional tests to master. You can run the unit test explicitly by calling
UNITTESTS="GitCacheResolveSubmoduleUrlTest" make unittests
They are currently failing. :-( For once, the repos can be specified with a trailing /
(the remote_url
as well as the submodule_url
). You already had a part in it that removes the trailing slash for the remote_url
, but the rstrip
method is not in-place, so that had no effect. ;-)
The other problem is that there is no path normalization, so intermediate parts like in ../not_here/../but_here
are not normalized. This will probably work with git but the internal database does not normalize the paths and that would result in two entries for on repo.
Thanks again for your effort!
The other problem is that there is no path normalization
That is done on purpose. I take the behavior of git itself as a reference. E.g. if a repo is cloned as https://github.com/user/foo.git
and a submodule is ../a/../bar.git
, running git submodule init
would produce something like 'bar' is registered with url https://github.com/user/a/../bar.git
, i.e. only relative parts at the beginning are being resolved.
https://github.com/user/a/../bar.git
Even though I'm not aware of such valid urls, I'm just trying to preserve same behavior as git would do it on its own.
For once, the repos can be specified with a trailing / (the remote_url as well as the submodule_url)
Right. Fixed trailing '/' for remote_url
. As for trailing '/' for submodule_url
(oroginal or "resolved") - I'd keep things as is as well. Again, to match the behavior of the original git tool.
Yeah git is quite lazy in that respect and relies on the server to normalize the path. That is even true with the trailing /
on the URLs.
git does not need to know whether github.com/seeraven/gitcache.git
and `github.com/seeraven/something/stupid/../../gitcache.git/" point to the same repository, but a cache would like to know that it is the same repository and not something different.
but a cache would like to know that it is the same repository and not something different.
Sounds like best if it would be a responsibility of get_mirror_path
function.
In which case we can 100% match the behavior of data fetching with the original git tool, and "optimize" the cache path.
You're reading my mind - I'm currently working on it. ;-)
So then we use the new resolve_submodule_url
function to generate the target path like git would do and when gitcache is called again for cloning the path is normalized.
Resolves: #24