src-d / datasets

source{d} datasets ("big code") for source code analysis and machine learning on source code
Other
322 stars 82 forks source link

Facing issue while dowloading a github dump using pga. #100

Open ayushi04 opened 5 years ago

ayushi04 commented 5 years ago

Unable to execute the following command : pga list -u /src-d/ -f json | pga get -i

The error I am getting is : The filename, directory name, or volume label syntax is incorrect

error-logs.txt

mcarmonaa commented 5 years ago

You must give to pga get -i a sequence of siva file names as in:

c13587212de574c5dadeac9fa483367d53717abe.siva
05ea82f75e9ba7c2158e94dd4a714d359d0cab02.siva
cce947b98a050c6d356bc6ba95030254914027b1.siva
9e7f20d3c0a40a715f993db75adfbf56e268a30a.siva
db191acb9884be36ec15e35bde798c449d26bebe.siva
65c397a8673c0f4b98e3867e5fd6efdaa7d9ccd2.siva
338126bc0b8a7b447acf1830030a39c16bc39195.siva
6bc52531e707eb4b9b875c418a84f2e100ff6e73.siva
738658b11c94345a8003fa41b5d19f39b09bba7f.siva
5d7303c49ac984a9fec60523f2d5297682e16646.siva

Using pga list -u /src-d/ -f json you are getting a sequence of JSON objects:

...
{
  "url": "https://github.com/src-d/beanstool",
  "sivaFilenames": [
    "c13587212de574c5dadeac9fa483367d53717abe.siva"
  ],
  "license": "MIT:0.988",
  "langs": [
    "Go",
    "Makefile",
    "Markdown",
    "Text",
    "YAML"
  ],
  "langsByteCount": [
    17700,
    1611,
    1850,
    1077,
    38
  ],
  "langsLinesCount": [
    885,
    65,
    55,
    23,
    4
  ],
  "langsFilesCount": [
    14,
    1,
    1,
    1,
    1
  ],
  "emptyLinesCount": [
    195,
    11,
    14,
    0,
    0
  ],
  "codeLinesCount": [
    676,
    48,
    40,
    0,
    3
  ],
  "commentLinesCount": [
    0,
    5,
    0,
    0,
    0
  ],
  "fileCount": 19,
  "commitsCount": 38,
  "branchesCount": 18,
  "forkCount": 0
}
{
  "url": "https://github.com/src-d/kmcuda",
  "sivaFilenames": [
    "05ea82f75e9ba7c2158e94dd4a714d359d0cab02.siva"
  ],
  "license": "Apache-2.0:0.985",
  "langs": [
    "C++",
    "CMake",
    "Cuda",
    "Jupyter Notebook",
    "Markdown",
    "Python",
    "R",
    "Shell",
    "YAML"
  ],
  "langsByteCount": [
    102107,
    5535,
    73274,
    371538,
    45560,
    32844,
    4210,
    935,
    703
  ],
  "langsLinesCount": [
    2893,
    135,
    2032,
    469,
    997,
    837,
    106,
    21,
    24
  ],
  "langsFilesCount": [
    8,
    2,
    4,
    1,
    5,
    2,
    1,
    2,
    1
  ],
  "emptyLinesCount": [
    58,
    0,
    0,
    0,
    195,
    95,
    3,
    0,
    4
  ],
  "codeLinesCount": [
    1702,
    115,
    0,
    468,
    799,
    734,
    101,
    0,
    19
  ],
  "commentLinesCount": [
    18,
    18,
    0,
    0,
    0,
    6,
    1,
    0,
    0
  ],
  "fileCount": 33,
  "commitsCount": 205,
  "branchesCount": 31,
  "forkCount": 0
}
...

You can use a command line utility like jq to parse the JSON output and build the list of siva files with something like this:

pga list -u /src-d/ -f json | jq -r '.sivaFilenames[]'| pga get -i 
ayushi04 commented 5 years ago

could you please share equivalent command of above solution in windows command line.

mcarmonaa commented 5 years ago

It should be the same in windows, there is a version of jq for it https://stedolan.github.io/jq/download/

ayushi04 commented 5 years ago

Thank you for the quick response. I am facing one more issue. If I run pga get -u /src-d/beanstool on command line, I am getting below error:

C:\Go\src\git_pga>pga get -u /src-d/beanstool 0 / 1 [------------------------------------------------------------------------------------------------------] 0.00%could not get siva\latest\c1\c13587212de574c5dadeac9fa483367d53717abe.siva: could not copy to temporary file siva\latest\c1\c13587212de574c5dadeac9fa483367d53717abe.siva.tmp: 404 Not Found Error: there where failed downloads

I am running the above command with administrative privilege. What might be the issue. Could you please suggest.

jfontan commented 5 years ago

Native Windows is not yet supported in pga tool. We are working to make it compatible and easier to install. Meanwhile you can use one of these two options: