src-d / datasets

source{d} datasets ("big code") for source code analysis and machine learning on source code
Other
323 stars 82 forks source link

Move PGA to its own repository #74

Closed smola closed 6 years ago

smola commented 6 years ago

As far as I know, we already made our minds about splitting PGA to its own repo, that would also be a good opportunity to consolidate documentation on gitbook, and redirecting pga.sourced.tech to the gitbook documentation (not /csv/ and /siva/ routes though).

cc @vmarkovtsev @campoy

vmarkovtsev commented 6 years ago

I see that the PGA is the king of our datasets, the root and the reason, so it deserves a separate web page. As for changing the GitHub structure, I don't feel good about it. Our datasets are in src-d/datasets, everything is simple and obvious atm.

smola commented 6 years ago

@vmarkovtsev I think I might have misunderstood some previous conversations about using a repository for each dataset and using src-d/datasets as an index.

Since it seems this will need some more discussion, I filed a separate issue for the gitbook: https://github.com/src-d/datasets/issues/75

eiso commented 6 years ago

Based on the discussions I've seen around GitHub. Is it fair to assume we decided to keep all datasets in the datasets repo?

eiso commented 6 years ago

I would like to keep it here and standardize on a structure: https://github.com/src-d/datasets/issues/76