src-d / datasets

source{d} datasets ("big code") for source code analysis and machine learning on source code
Other
322 stars 82 forks source link

Unexpected EOF - pga get -l go #102

Closed elithrar closed 4 years ago

elithrar commented 5 years ago

pga version: eb71a82 Go version: 1.11.5 linux/amd64

Problem: Seeing corrupted files when the pga tool is attempting to rename temporary files. The tool also appears to hang - after ~10 hours, it's only 1.87% complete (467/24928; 75.5GiB) on a 8-core VM in GCP with a 16Gbps NIC.

$ pga get -l go
 467 / 24928 [==>-----------------------------------------------------------------------------------------------------------------]   1.87% 2h34m48s
could not get siva/latest/98/9822bb0f781b94b1c7610b2df2ae4817e257c9bb.siva: could not copy to temporary file siva/latest/98/9822bb0f781b94b1c7610b2d
f2ae4817e257c9bb.siva.tmp: could not copy http://pga.sourced.tech//siva/latest/98/9822bb0f781b94b1c7610b2df2ae4817e257c9bb.siva to siva/latest/98/98
22bb0f781b94b1c7610b2df2ae4817e257c9bb.siva.tmp: unexpected EOF
 467 / 24928 [==>-----------------------------------------------------------------------------------------------------------------]   1.87% 2h28m29s
could not get siva/latest/57/5708afc613c3a27489ed4be2560d49ef1752eaeb.siva: could not copy to temporary file siva/latest/57/5708afc613c3a27489ed4be2
560d49ef1752eaeb.siva.tmp: could not copy http://pga.sourced.tech//siva/latest/57/5708afc613c3a27489ed4be2560d49ef1752eaeb.siva to siva/latest/57/57
08afc613c3a27489ed4be2560d49ef1752eaeb.siva.tmp: unexpected EOF
 467 / 24928 [==>------------------------------------------------------------------------------------------------------------------]   1.87% 2h4m12s
could not get siva/latest/5b/5b8009800d6e8459453463db5a12f76ed146c7d7.siva: rename siva/latest/5b/5b8009800d6e8459453463db5a12f76ed146c7d7.siva.tmp 
to siva/latest/5b/5b8009800d6e8459453463db5a12f76ed146c7d7.siva failed: rename siva/latest/5b/5b8009800d6e8459453463db5a12f76ed146c7d7.siva.tmp siva
/latest/5b/5b8009800d6e8459453463db5a12f76ed146c7d7.siva: no such file or directory
 467 / 24928 [==>--------------------------------------------------------------------------------------------------------------------------]   1.87%
could not get siva/latest/b4/b4d0dc444d6c1088fe6c15743f7764f39c57f501.siva: could not copy to temporary file siva/latest/b4/b4d0dc444d6c1088fe6c1574
3f7764f39c57f501.siva.tmp: could not copy http://pga.sourced.tech//siva/latest/b4/b4d0dc444d6c1088fe6c15743f7764f39c57f501.siva to siva/latest/b4/b4
d0dc444d6c1088fe6c15743f7764f39c57f501.siva.tmp: unexpected EOF
 467 / 24928 [==>--------------------------------------------------------------------------------------------------------------------------]   1.87%
could not get siva/latest/50/50a4667b3b8dda5a16a78f0dcc6f7b1eab8924f8.siva: could not copy to temporary file siva/latest/50/50a4667b3b8dda5a16a78f0d
cc6f7b1eab8924f8.siva.tmp: could not copy http://pga.sourced.tech//siva/latest/50/50a4667b3b8dda5a16a78f0dcc6f7b1eab8924f8.siva to siva/latest/50/50
a4667b3b8dda5a16a78f0dcc6f7b1eab8924f8.siva.tmp: unexpected EOF
 467 / 24928 [==>--------------------------------------------------------------------------------------------------------------------------]   1.87%
could not get siva/latest/78/78ffcbdd4f5de3ed41516c7b74ddc1d3c657df39.siva: rename siva/latest/78/78ffcbdd4f5de3ed41516c7b74ddc1d3c657df39.siva.tmp 
to siva/latest/78/78ffcbdd4f5de3ed41516c7b74ddc1d3c657df39.siva failed: rename siva/latest/78/78ffcbdd4f5de3ed41516c7b74ddc1d3c657df39.siva.tmp siva
/latest/78/78ffcbdd4f5de3ed41516c7b74ddc1d3c657df39.siva: no such file or directory
 467 / 24928 [==>--------------------------------------------------------------------------------------------------------------------------]   1.87%
could not get siva/latest/d6/d6026c727ad9767fe6ccd4b14597453cd9bbac4c.siva: could not copy to temporary file siva/latest/d6/d6026c727ad9767fe6ccd4b1
4597453cd9bbac4c.siva.tmp: close siva/latest/d6/d6026c727ad9767fe6ccd4b14597453cd9bbac4c.siva.tmp: input/output error
 467 / 24928 [==>--------------------------------------------------------------------------------------------------------------------------]   1.87%
could not get siva/latest/c1/c14d891d44f0afff64e56ed7c9702df1d807b1ee.siva: could not copy to temporary file siva/latest/c1/c14d891d44f0afff64e56ed7
c9702df1d807b1ee.siva.tmp: could not copy http://pga.sourced.tech//siva/latest/c1/c14d891d44f0afff64e56ed7c9702df1d807b1ee.siva to siva/latest/c1/c1
4d891d44f0afff64e56ed7c9702df1d807b1ee.siva.tmp: unexpected EOF
 467 / 24928 [==>--------------------------------------------------------------------------------------------------------------------------]   1.87%
could not get siva/latest/05/0527d29da443886d92e9a418180c5b25a5f8d270.siva: could not copy to temporary file siva/latest/05/0527d29da443886d92e9a418
180c5b25a5f8d270.siva.tmp: could not copy http://pga.sourced.tech//siva/latest/05/0527d29da443886d92e9a418180c5b25a5f8d270.siva to siva/latest/05/05
27d29da443886d92e9a418180c5b25a5f8d270.siva.tmp: unexpected EOF
 467 / 24928 [==>--------------------------------------------------------------------------------------------------------------------------]   1.87

I'm wondering if this is a subtle race condition due to the way temporary files are named as name + ".tmp" with multiple workers?

mcarmonaa commented 5 years ago

Recently changes to the index to add information about the number of stars of repositories were introduced but the index itself hasn't been regenerated yet, so using the pga command should fail. Here is a PR to fix that and add compatibility between index versions #104 .

Anyway this problem doesn't seem to be related to it but I cannot reproduce it, maybe some HTTP issue involving GCP firewall configuration?