Closed NielsRogge closed 1 year ago
want to open a PR to do this?
Ok update: it works when your local directory is an empty repository (in which case it will be overwritten by the remote git repository), but if it already contains files and is not a git repository, then you get the following error:
!mkdir checkpoint
# put a file in there
!touch checkpoint/test.txt
from huggingface_hub import Repository
repo_url = "https://huggingface.co/microsoft/beit-large-patch16-224"
repo = Repository(local_dir="checkpoint", # note that this directory must not exist already
clone_from=repo_url,
git_user="Niels Rogge",
git_email="niels.rogge1@gmail.com",
use_auth_token=True,
)
which gives:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-15-7b379edad3e5> in <module>()
7 git_user="Niels Rogge",
8 git_email="niels.rogge1@gmail.com",
----> 9 use_auth_token=True,
10 )
1 frames
/usr/local/lib/python3.7/dist-packages/huggingface_hub/repository.py in clone_from(self, repo_url, use_auth_token)
316 if not in_repository:
317 raise EnvironmentError(
--> 318 "Tried to clone a repository in a non-empty folder that isn't a git repository. If you really "
319 "want to do this, do it manually:\m"
320 "git init && git remote add origin && git pull origin main\n"
OSError: Tried to clone a repository in a non-empty folder that isn't a git repository. If you really want to do this, do it manually:\mgit init && git remote add origin && git pull origin main
or clone repo to a new folder and move your existing files there afterwards.
The use case that I had was when I already had model files in my local directory, without it being a git repository.
Yes, overwrite_local_dir
would rm the local directory before doing anything else so it would fix this
No it shouldn't remove it, it should basically join the files that already in the local directory with the files of the remote git repository, right?
then what happens for a filename that's both in the local dir and in the remote repo?
We had something like this w/ @LysandreJik before, but the desired behavior was unspecified so we just removed it.
then what happens for a filename that's both in the local dir and in the remote repo?
Yeah, that's indeed a good question, perhaps we can leave it like that in that case.
Edit: perhaps we can only allow it in case the remote repository has just been created (i.e. is an empty git repository).
I'm a bit confused by the use case
However, it would be useful if it just overwrites the local directory, in case it already exists, because I had to upload several checkpoints, and I had to remove that local directory each time I wanted to upload a new checkpoint.
Is your idea that files should be joined or that they should be overwritten? (or joined and just overwritten when same filename?)
So my use case was the following:
1) I had some local files in a directory (pytorch_model.bin, config.json, vocab.txt). This directory was not a git repository, just a local directory.
2) I created a remote repository using api.create_repo
. My goal was to upload my local files to that remote repository.
3) When I then use Repository
, with local_dir
being equal to my local directory of step 1, I get the error specified above. Ideally, I could just push those files to the remote repository in a subsequent step with repo.push_to_hub()
.
=> so perhaps we can only allow it in case the remote repository is empty.
I think the error suggestion should be good for this use case:
git init && git remote add origin https://huggingface.co/microsoft/beit-large-patch16-224 && git pull origin main
Which will give you a local directory with your local files + the new files from the repo.
And then you should be able to follow-up with pushing. But maybe we could make this simpler indeed.
(closing as "wontfix" as Repository usage is deprecated anyway)
Lately, I've been using huggingface_hub to upload BEiT (a new model) checkpoints to the hub. I used the following code:
When instantiating a
Repository
, thelocal_dir
must not exist already with the current implementation. However, it would be useful if it just overwrites the local directory, in case it already exists, because I had to upload several checkpoints, and I had to remove that local directory each time I wanted to upload a new checkpoint.cc @julien-c who suggested a
overwrite_local_dir
flag.