src-d / datasets

source{d} datasets ("big code") for source code analysis and machine learning on source code
Other
322 stars 82 forks source link

Update the CSV file in pga.sourced.tech #66

Open warenlg opened 6 years ago

warenlg commented 6 years ago

Currently, the csv file we can download from pga.sourced.tech is not the final one we had, maybe we should update it.

Right now, the one from pga.sourced.tech has 181,482 rows. And 2 months ago, Data Retrieval gave me through Vadim a final csv file with 182,014 rows (I have it locally).

vmarkovtsev commented 6 years ago

@bzz This can be the reason 2.4 TB != 3.0 TB

bzz commented 6 years ago

👍 I just have started another round of pga get with #69 and will be happy to make another full run, as soon as this issue is resolved.