Closed joepie91 closed 10 years ago
I am also experiencing this problem. Here is some more info:
Command:
$ ~/.local/bin/ia upload --verbose delcampe_20140126 www.delcampe.net.1.warc.gz www.delcampe.net.2.warc.gz www.delcampe.net.3.warc.gz --metadata="title:Delcampe.net Sampler Grab (2014-01-26)" --metadata="creator:www.delcampe.net" --metadata="description:This is a sampler WARC grab of http://www.delcampe.net using Wpull. Delcampe is an auction site." --metadata="subject:archiveteam"
$ pip freeze
GnuPGInterface==0.3.2
Landscape-Client==12.05
PAM==0.4.2
PyYAML==3.10
Twisted-Core==11.1.0
apt-xapian-index==0.44
argparse==1.2.1
args==0.1.0
chardet==2.0.1
clint==0.3.3
command-not-found==0.2.44
docopt==0.6.1
httplib2==0.7.2
internetarchive==0.5.0
jsonpatch==0.4
keyring==0.9.2
language-selector==0.1
launchpadlib==1.9.12
lazr.restfulclient==0.12.0
lazr.uri==1.0.3
oauth==1.0.1
py==1.4.20
pyOpenSSL==0.12
pycrypto==2.4.1
pyserial==2.5
pytest==2.3.4
python-apt==0.8.3ubuntu7.1
python-debian==0.1.21ubuntu1
requests==2.2.0
simplejson==2.3.2
ufw==0.31.1-1
wadllib==1.3.0
wsgiref==0.1.2
zope.interface==3.6.1
Traceback:
] 32281/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32282/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32283/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32284/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32285/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32286/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32287/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32288/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32289/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32290/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32291/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32292/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32293/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32294/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32295/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32296/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32297/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32298/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32299/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32300/507826 - 00 uploading www.delcampe.net.1.warc.gz: [## ] 32301/507826 - 00 uploadi uploading www.delcampe.net.1.warc.gz: [## ] 3752 uploading www.delcampe.net.1.warc.gz uploading www.delcampe.net.1.warc.gz: [#### ] 64754/507826 - Traceback (most recent call last):.gz: [#### ] 68206/507826 - 00:03:12
File "/home/chris/.local/bin/ia", line 9, in <module>
load_entry_point('internetarchive==0.5.0', 'console_scripts', 'ia')()
File "/home/chris/.local/lib/python2.7/site-packages/iacli/ia.py", line 91, in main
ia_module.main(argv)
File "/home/chris/.local/lib/python2.7/site-packages/iacli/ia_upload.py", line 68, in main
response = upload(args['<identifier>'], local_file, **upload_kwargs)
File "/home/chris/.local/lib/python2.7/site-packages/internetarchive/api.py", line 74, in upload
return item.upload(files, **kwargs)
File "/home/chris/.local/lib/python2.7/site-packages/internetarchive/item.py", line 433, in upload
resp = self.upload_file(f, **kwargs)
File "/home/chris/.local/lib/python2.7/site-packages/internetarchive/item.py", line 386, in upload_file
return self.session.send(prepared_request, stream=True)
File "/home/chris/.local/lib/python2.7/site-packages/requests/sessions.py", line 486, in send
r = adapter.send(request, **kwargs)
File "/home/chris/.local/lib/python2.7/site-packages/requests/adapters.py", line 382, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='s3.us.archive.org', port=80): Max retries exceeded with url: /delcampe_20140126/www.delcampe.net.1.warc.gz (Caused by <class 'socket.error'>: [Errno 32] Broken pipe)
@joepie91 & @chfoo, this looks like a transient error -- the connection to the IA-S3 API is failing. Both of your code/commands look good, I think it is simply a failure to connect to S3. The handling and error reporting should be better, hopefully I can fix this soon.
The IA-S3 API has been a bit overloaded this past week, if you continue to run into these errors I recommend waiting a bit and trying again later.
Please let me know if you continue to have issues or if it doesn't appear to be transient. I'll leave the issue open until I can get around to handling these exceptions more gracefully.
Thanks for reporting the issue!
It definitely does not seem to be transient; not in my case, anyway. My Pastebin scraper uses the exact same library against the exact same interface to upload an archive every night, and it works fine: https://archive.org/details/pastebinpastes
I've been attempting to run the problematic code at several points during the past few weeks, and it has failed every single time; yet, uploading Pastebin scrapes always works.
I tried running my command again twice, a day later, and it always failed around the same percent. I uploaded using curl (with the same info in the metadata) and it finished successfully, but that was done a few hours later. I'm not sure whether it's transient in my case.
I should point out that my uploads fail straight-away; it doesn't fail somewhere throughout the process, as @chfoo appears to be experiencing.
After repeatedly trying an upload throughout the day, I found out that I didn't have AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
set in my environment (because it was in a different screen
session). After setting them, the uploads uploaded successfully.
I've been working on a script to automatically upload recorded livesets to the Internet Archive, but I've been running into a ConnectionError exception while doing so, and I've been unable to figure out what is causing it.
I've been attempting to run this several times during the past few weeks, thinking it might just be a network issue, but the issue always occurs for this script. Another script I run on the same server, for automatically uploading scraped pastes from Pastebin, works just fine. Is there something I'm doing wrong, or is this a bug?
This is what happens:
This is my code:
This is what is in said directory: