salesforce / WikiSQL

A large annotated semantic parsing corpus for developing natural language interfaces.
BSD 3-Clause "New" or "Revised" License
1.6k stars 320 forks source link

Invalid File Names while cloning the GitHub repo #104

Open ButteryPaws opened 6 months ago

ButteryPaws commented 6 months ago

On cloning this repository using the command git clone https://github.com/salesforce/WikiSQL Git is able to download the repository but is not able to extract all the files. An error is encountered as follows:

Cloning into 'WikiSQL'...
remote: Enumerating objects: 386, done.
remote: Counting objects: 100% (192/192), done.
remote: Compressing objects: 100% (38/38), done.
remote: Total 386 (delta 185), reused 154 (delta 154), pack-reused 194
Receiving objects: 100% (386/386), 50.72 MiB | 19.88 MiB/s, done.
Resolving deltas: 100% (212/212), done.
error: unable to create file collection/paraphrase/Icon?: Invalid argument
error: unable to create file collection/paraphrase/paraphrase_files/Icon?: Invalid argument
error: unable to create file collection/verify/Icon?: Invalid argument
error: unable to create file collection/verify/verify_files/Icon?: Invalid argument
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

This is the output of running git status

On branch master
Your branch is up to date with 'origin/master'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
    deleted:    .dockerignore
    deleted:    .gitattributes
    deleted:    .gitignore
    deleted:    .travis.yml
    deleted:    CODEOWNERS
    deleted:    LICENSE
    deleted:    README.md
    deleted:    annotate.py
    deleted:    collection/README.md
    deleted:    "collection/paraphrase/Icon\r"
    deleted:    collection/paraphrase/index.html
    deleted:    "collection/paraphrase/paraphrase_files/Icon\r"
    deleted:    collection/paraphrase/paraphrase_files/bootstrap.min.css
    deleted:    collection/paraphrase/paraphrase_files/bootstrap.min.js
    deleted:    collection/paraphrase/paraphrase_files/jquery-3.2.1.min.js
    deleted:    collection/paraphrase/paraphrase_files/toastr.min.css
    deleted:    collection/paraphrase/paraphrase_files/toastr.min.js
    deleted:    "collection/verify/Icon\r"
    deleted:    collection/verify/verify.html
    deleted:    "collection/verify/verify_files/Icon\r"
    deleted:    collection/verify/verify_files/bootstrap.min.css
    deleted:    collection/verify/verify_files/bootstrap.min.js
    deleted:    collection/verify/verify_files/jquery-3.2.1.min.js
    deleted:    collection/verify/verify_files/toastr.min.css
    deleted:    collection/verify/verify_files/toastr.min.js
    deleted:    data.tar.bz2
    deleted:    evaluate.py
    deleted:    lib/__init__.py
    deleted:    lib/common.py
    deleted:    lib/dbengine.py
    deleted:    lib/query.py
    deleted:    lib/table.py
    deleted:    requirements.txt
    deleted:    test/Dockerfile
    deleted:    test/check.py
    deleted:    test/example.pred.dev.jsonl.bz2
    deleted:    version.txt

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    .dockerignore
    .gitattributes
    .gitignore
    .travis.yml
    CODEOWNERS
    LICENSE
    README.md
    annotate.py
    collection/
    data.tar.bz2
    evaluate.py
    lib/
    requirements.txt
    test/
    version.txt

And the output of running git restore --source=HEAD :/ as suggested:

error: unable to create file collection/paraphrase/Icon?: Invalid argument
error: unable to create file collection/paraphrase/paraphrase_files/Icon?: Invalid argument
error: unable to create file collection/verify/Icon?: Invalid argument
error: unable to create file collection/verify/verify_files/Icon?: Invalid argument

It seems like the issue is with the filename of the files which contains question marks, a character which is not allowed in file names in Linux file systems.

I attempted to see if the issue can be resolved by downloading the missing files directly from GitHub into the directory where it is supposed to be, for example this file. But it is not possible to download this file as it is and using wget fails as well.

An alternate method tried by me was to download the the Master branch code as a ZIP file and extract is using the unzip WikiSQL-master.zip command. This method works fine and in fact, even the offending files (such as collection/paraphrase/Icon) were successfully extracted with no illegal characters in their file names. It seems like this is an issue with how Git is extracting the files in this repository.