iipc / openwayback

The OpenWayback Development
http://www.netpreserve.org/openwayback
Apache License 2.0
483 stars 274 forks source link

using s3 buckets? #425

Closed chichicuervo closed 4 years ago

chichicuervo commented 4 years ago

there are a few mentions of S3 bucket support in 2015. Does it actually work, and if so, how do I configure so it will actually work?

ldko commented 4 years ago

HI @chichicuervo, the support for S3 buckets was contributed from someone outside of IIPC via #188 and #189, but as far as I know, there was no documentation provided for it. The code to support it is still in ResourceFactory. It indicates credentials would go into $HADOOP_CONF/core-site.xml. Other than that, I would try to configure it per the CDX instructions in the wiki and using a FlatFileResourceFileLocationDB (path-index.txt) as described there, that had the addresses of the WARCs in the S3 buckets.

chichicuervo commented 4 years ago

yeah... the lack of documentation is kind of the problem. Particularly since the hadoop stuff neither makes any sense to me (not a java dev) or does the link mentioned in previous issues point to anything but a 404

ldko commented 4 years ago

It looks like the Internet Archive has an archived copy of that page from around the time that shows key configuration.

chichicuervo commented 4 years ago

for reference's sake... this didn't work.. attempting to go the CDX route