src-d / datasets

source{d} datasets ("big code") for source code analysis and machine learning on source code
Other
321 stars 82 forks source link

Getting connection refused #171

Open stavikpetr opened 4 years ago

stavikpetr commented 4 years ago

Hello,

for about one month now, I am getting connection refused when using the tool. For example command pga list siva results in the following error message:

WARN[0001] could not check mod time in /home/stavik/.pga/siva/latest.index.csv.gz: Head http://pga.sourced.tech/csv/siva/latest.index.csv.gz: dial tcp 147.135.39.104:80: connect: connection refused Error: could not open index file: could not copy to temporary file /home/stavik/.pga/siva/latest.index.csv.gz.tmp: Get http://pga.sourced.tech/csv/siva/latest.index.csv.gz: dial tcp 147.135.39.104:80: connect: connection refused

anteos59 commented 4 years ago

I'm getting almost the same error message: Error: could not open index file: could not copy to temporary file C:\Users\Nutzer.pga\siva\latest.index.csv.gz.tmp: Get http://pga.sourced.tech/csv/siva/latest.index.csv.gz: dial tcp 147.135.39.104:80: connectex: No connection could be made because the target machine actively refused it. Is there anything I could do? I would like to use PGA for my master thesis, so any help would be greatly appreciated.

stavikpetr commented 4 years ago

anteos, I am in the same spot as you - I would also like to use PGA for my master thesis. Fortunately for me, I saved one dump of the tool just before it stopped working - it is the 173 MB dump of all repositories with more than 50 stars, which is basically the output of comman pga list siva. If you are interested in getting this dump, then contact me via email that you can find on my profile.

vmarkovtsev commented 4 years ago

Sorry for responding late (winter holidays). source{d} no longer exists, so the public datasets that had to be served from a dedicated server are all down. The tooling that depends on the server is essentially non-functional anymore. The files that are on Google Drive are still there and were additionally backed up, so they should continue to work. We were able to copy all the siva and UAST parquet files to Google Cloud Storage, however, we accidentally lost the PGA index file (CSV). If somebody has a copy, please send it to me.

The new company - Athenian - where some of the core devs migrated including me is sponsoring the GCS, however, public serving from there costs too much currently. I'll try to find an alternative way with @guillemdb and @sergio-hcsoft who have recently bought the former data processing/ML cluster and have 80TB of free storage.

Now if you need a fraction of PGA, please send me the list of desired siva files, I will fetch them from GCS, package together and upload to Google Drive. Composing that list without an index is impossible though.

Laura-lc commented 4 years ago

Sorry for responding late (winter holidays). source{d} no longer exists, so the public datasets that had to be served from a dedicated server are all down. The tooling that depends on the server is essentially non-functional anymore. The files that are on Google Drive are still there and were additionally backed up, so they should continue to work. We were able to copy all the siva and UAST parquet files to Google Cloud Storage, however, we accidentally lost the PGA index file (CSV). If somebody has a copy, please send it to me.

The new company - Athenian - where some of the core devs migrated including me is sponsoring the GCS, however, public serving from there costs too much currently. I'll try to find an alternative way with @Guillemdb and @sergio-hcsoft who have recently bought the former data processing/ML cluster and have 80TB of free storage.

Now if you need a fraction of PGA, please send me the list of desired siva files, I will fetch them from GCS, package together and upload to Google Drive. Composing that list without an index is impossible though.

Hello vmarkovtsev, I would like to get the metadata and source code (code diffs) of some of the projects in PGA for my master thesis. May I know how can I get those data? I emailed you before but got no response, may I know how can I contact you? Thank you so much.

giper45 commented 4 years ago

Hi, news about it ?

darius-sas commented 4 years ago

I am also interested in this. It's a pity that we cannot access this huge dataset :(

dgrahn commented 3 years ago

@vmarkovtsev I would like a copy of any C/C++ repositories you have archived. I can provide a dropbox if you need it.

peterzsj6 commented 3 years ago

Ops, so..... no more data?