src-d / datasets

source{d} datasets ("big code") for source code analysis and machine learning on source code
Other
321 stars 82 forks source link

Errors/hanging when running pga get --lang java --output . #142

Closed pirocks closed 5 years ago

pirocks commented 5 years ago

After running pga get --lang java --output .. I get The following output:

 425 / 45164 [=>-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------]   0.94% 10h14m59scould not get siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva: rename siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva.tmp to siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva failed: rename siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva.tmp siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva: no such file or directory
 425 / 45164 [=>-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------]   0.94% 10h14m23scould not get siva/latest/6c/6c048e1e161e47c9131f0197f5f74ea312697d84.siva: rename siva/latest/6c/6c048e1e161e47c9131f0197f5f74ea312697d84.siva.tmp to siva/latest/6c/6c048e1e161e47c9131f0197f5f74ea312697d84.siva failed: rename siva/latest/6c/6c048e1e161e47c9131f0197f5f74ea312697d84.siva.tmp siva/latest/6c/6c048e1e161e47c9131f0197f5f74ea312697d84.siva: no such file or directory
 425 / 45164 [=>------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------]   0.94% 9h12m39scould not get siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva: rename siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva.tmp to siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva failed: rename siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva.tmp siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva: no such file or directory
 425 / 45164 [=>--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------]   0.94% 9h6m7scould not get siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva: rename siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva.tmp to siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva failed: rename siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva.tmp siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva: no such file or directory
 425 / 45164 [=>-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------]   0.94% 8h57m9scould not get siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva: rename siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva.tmp to siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva failed: rename siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva.tmp siva/latest/17/171918e41aab85c4d6609dfaabfc8abbef44dbd9.siva: no such file or directory

The download seems to hang on file 425 with the above errors.

vmarkovtsev commented 5 years ago

I don't think it hangs really. Check your network top, I am pretty sure it is still downloading. Those errors mean that some siva files listed in the index are missing from the remote storage, right @jfontan ?

pirocks commented 5 years ago

I continued seeing heavy network activity for an hour or so, and then it stopped. I Ctrl+C'ed it and tried again(in the same directory), got a fair bit further before getting similar errors, and an eventual drop in network activity.

vmarkovtsev commented 5 years ago

OK, since this apparently works fine for us, we need some information from your side:

Perhaps there are connection errors which we don't handle gracefully.

The immediate workaround is running ./pga list -l java -f csv, getting the list of siva file names, and downloading them by hand after prepending https://pga.sourced.tech/siva/latest/ and then the first two letters of each name. For example, to fetch 20b626de662752618b3c28d192486b15f84a2087.siva, the URL would be https://pga.sourced.tech/siva/latest/20/20b626de662752618b3c28d192486b15f84a2087.siva. It is interesting to see errors of the semi-manual download.

pirocks commented 5 years ago

I've since rerun a third time in the same directory, and at the moment I have no errors, with good network utilization.

jfontan commented 5 years ago

@mcarmonaa can you please take a look

vmarkovtsev commented 5 years ago

@pirocks Can you please test the fixed code (see here how to build from source). We have closed this issue, but if the problem persists, we will re-open.