src-d / datasets

source{d} datasets ("big code") for source code analysis and machine learning on source code
Other
322 stars 82 forks source link

Add stars field to pga index #101

Closed mcarmonaa closed 5 years ago

mcarmonaa commented 5 years ago

Closes #43

Now the index has a new field STARS containing the number of starts for each repository.

URL,SIVA_FILENAMES,FILE_COUNT,LANGS,LANGS_BYTE_COUNT,LANGS_LINES_COUNT,LANGS_FILES_COUNT,COMMITS_COUNT,BRANCHES_COUNT,FORK_COUNT,EMPTY_LINES_COUNT,CODE_LINES_COUNT,COMMENT_LINES_COUNT,LICENSE,STARS
git://github.com/FreeCodeCamp/FreeCodeCamp.git,7a80dfe1684664cefd2923bdbb329dcb9a48dc4f.siva,36547,"CSS,EJS,HTML,INI,JSON,JSON5,JavaScript,Less,Markdown,Pug,SVG,TOML,Text,XML,YAML","38234,2164,712,413,5710986,1155,4084308,33,70728961,14724,12177,583,72,387,1964","2322,59,21,29,155975,61,36225,2,1585217,264,89,12,4,13,115","48,5,2,2,418,2,384,1,35622,5,3,1,1,1,4",21123,27808,0,"242,0,3,0,3,0,3683,0,432408,0,0,1,0,0,14","2022,0,17,0,155879,0,30510,0,1125623,0,0,4,0,12,83","20,0,0,0,0,0,1655,0,0,0,0,7,0,0,14","BSD-3-Clause:0.991,BSD-3-Clause-Clear:0.853,BSD-3-Clause-No-Nuclear-License-2014:0.783,BSD-4-Clause:0.838,BSD-Source-Code:0.831",230986

And the pga command can read it:

$ pga list -f json | jq

{
  "url": "git://github.com/FreeCodeCamp/FreeCodeCamp.git",
  "sivaFilenames": [
    "7a80dfe1684664cefd2923bdbb329dcb9a48dc4f.siva"
  ],
  "license": "BSD-3-Clause:0.991,BSD-3-Clause-Clear:0.853,BSD-3-Clause-No-Nuclear-License-2014:0.783,BSD-4-Clause:0.838,BSD-Source-Code:0.831",
  "langs": [
    "CSS",
    "EJS",
    "HTML",
    "INI",
    "JSON",
    "JSON5",
    "JavaScript",
    "Less",
    "Markdown",
    "Pug",
    "SVG",
    "TOML",
    "Text",
    "XML",
    "YAML"
  ],
  "langsByteCount": [
    38234,
    2164,
    712,
    413,
    5710986,
    1155,
    4084308,
    33,
    70728961,
    14724,
    12177,
    583,
    72,
    387,
    1964
  ],
  "langsLinesCount": [
    2322,
    59,
    21,
    29,
    155975,
    61,
    36225,
    2,
    1585217,
    264,
    89,
    12,
    4,
    13,
    115
  ],
  "langsFilesCount": [
    48,
    5,
    2,
    2,
    418,
    2,
    384,
    1,
    35622,
    5,
    3,
    1,
    1,
    1,
    4
  ],
  "emptyLinesCount": [
    242,
    0,
    3,
    0,
    3,
    0,
    3683,
    0,
    432408,
    0,
    0,
    1,
    0,
    0,
    14
  ],
  "codeLinesCount": [
    2022,
    0,
    17,
    0,
    155879,
    0,
    30510,
    0,
    1125623,
    0,
    0,
    4,
    0,
    12,
    83
  ],
  "commentLinesCount": [
    20,
    0,
    0,
    0,
    0,
    0,
    1655,
    0,
    0,
    0,
    0,
    7,
    0,
    0,
    14
  ],
  "fileCount": 36547,
  "commitsCount": 21123,
  "branchesCount": 27808,
  "forkCount": 0,
  "stars": 230986
}
marnovo commented 5 years ago

Awesome! :) worth letting people know about it on the community channel.