h4ck3rm1k3 / hadoop-archive-org-bucket-fs

How to get hadoop to talk to archive.org
0 stars 0 forks source link

java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Date #1

Open h4ck3rm1k3 opened 8 years ago

h4ck3rm1k3 commented 8 years ago

1272 [main] DEBUG org.jets3t.service.impl.rest.httpclient.RestStorageService - Found 0 common prefixes in one batch 1327 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(value=[Rate of successful kerberos logins and latency (milliseconds)], about=, valueName=Time, type=DEFAULT, always=false, sampleName=Ops) 1334 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(value=[Rate of failed kerberos logins and latency (milliseconds)], about=, valueName=Time, type=DEFAULT, always=false, sampleName=Ops) 1335 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(value=[GetGroups], about=, valueName=Time, type=DEFAULT, always=false, sampleName=Ops) 1336 [main] DEBUG org.apache.hadoop.metrics2.impl.MetricsSystemImpl - UgiMetrics, User and group related metrics 1441 [main] DEBUG org.apache.hadoop.security.authentication.util.KerberosName - Kerberos krb5 configuration not found, setting default realm to empty 1444 [main] DEBUG org.apache.hadoop.security.Groups - Creating new Groups object 1445 [main] DEBUG org.apache.hadoop.util.NativeCodeLoader - Trying to load the custom-built native-hadoop library... 1446 [main] DEBUG org.apache.hadoop.util.NativeCodeLoader - Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path 1446 [main] DEBUG org.apache.hadoop.util.NativeCodeLoader - java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib 1446 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 1446 [main] DEBUG org.apache.hadoop.util.PerformanceAdvisory - Falling back to shell based 1447 [main] DEBUG org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping 1479 [main] DEBUG org.apache.hadoop.security.Groups - Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000 1482 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - hadoop login 1482 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - hadoop login commit 1484 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - using local user:UnixPrincipal: mdupont 1484 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - Using user: "UnixPrincipal: mdupont" with name mdupont 1484 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - User entry: "mdupont" 1485 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - UGI loginUser:mdupont (auth:SIMPLE) 1497 [main] DEBUG org.apache.htrace.core.Tracer - sampler.classes = ; loaded no samplers 1499 [main] DEBUG org.apache.htrace.core.Tracer - span.receiver.classes = ; loaded no span receivers 1514 [main] DEBUG org.apache.hadoop.fs.s3native.NativeS3FileSystem - getFileStatus retrieving metadata for key '_github_projects_t_0000000.data' 1515 [main] DEBUG org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore - Getting metadata for key: github_projects_t_000000_0.data from bucket: github_projects_data_sampler 1515 [main] DEBUG org.jets3t.service.impl.rest.httpclient.RestStorageService - Retrieving Head information for bucket github_projects_data_sampler and object _github_projects_t_000000_0.data 1515 [main] DEBUG org.jets3t.service.Jets3tProperties - s3service.disable-dns-buckets=false 1515 [main] DEBUG org.jets3t.service.Jets3tProperties - s3service.s3-endpoint=s3.us.archive.org 1515 [main] DEBUG org.jets3t.service.Jets3tProperties - s3service.s3-endpoint-virtual-path= 1515 [main] DEBUG org.jets3t.service.Jets3tProperties - s3service.s3-endpoint-http-port=80 1515 [main] DEBUG org.jets3t.service.impl.rest.httpclient.RestStorageService - S3 URL: http://s3.us.archive.org:80/github_projects_data_sampler/___github_projects_t_000000_0.data 1515 [main] DEBUG org.jets3t.service.impl.rest.httpclient.RestStorageService - Performing HEAD request for 'http://s3.us.archive.org:80/github_projects_data_sampler/___github_projects_t_000000_0.data', expecting response codes: [200] 1515 [main] DEBUG org.jets3t.service.impl.rest.httpclient.RestStorageService - Headers: [Date: Mon, 22 Feb 2016 13:07:38 GMT] 1516 [main] DEBUG org.jets3t.service.impl.rest.httpclient.RestStorageService - Adding authorization for Access Key 's6XZuf8L6MionO98'. 1516 [main] DEBUG org.jets3t.service.Jets3tProperties - s3service.s3-endpoint=s3.us.archive.org 1516 [main] DEBUG org.jets3t.service.impl.rest.httpclient.RestStorageService - For creating canonical string, using uri: /github_projects_data_sampler/_github_projects_t_000000_0.data 1516 [main] DEBUG org.jets3t.service.impl.rest.httpclient.RestStorageService - Canonical string ('|' is a newline): HEAD|||Mon, 22 Feb 2016 13:07:38 GMT|/github_projects_datasampler/github_projects_t_000000_0.data 1516 [main] DEBUG org.jets3t.service.utils.RestUtils$ThreadSafeConnManager - Get connection: HttpRoute[{}->http://s3.us.archive.org:80], timeout = 0 1516 [main] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - [HttpRoute[{}->http://s3.us.archive.org:80]] total kept alive: 1, total issued: 0, total allocated: 1 out of 20 1516 [main] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - No free connections [HttpRoute[{}->http://s3.us.archive.org:80]][null] 1516 [main] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - Available capacity: 20 out of 20 [HttpRoute[{}->http://s3.us.archive.org:80]][null] 1516 [main] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - Creating new connection [HttpRoute[{}->http://s3.us.archive.org:80]] 1516 [main] DEBUG org.apache.http.impl.conn.DefaultClientConnectionOperator - Connecting to s3.us.archive.org:80 1589 [main] DEBUG org.apache.http.client.protocol.RequestAddCookies - CookieSpec selected: best-match 1589 [main] DEBUG org.apache.http.client.protocol.RequestAuthCache - Auth cache not set in the context 1589 [main] DEBUG org.apache.http.impl.client.DefaultHttpClient - Attempt 1 to execute request 1589 [main] DEBUG org.apache.http.impl.conn.DefaultClientConnection - Sending request: HEAD /github_projects_data_sampler/_github_projects_t_000000_0.data HTTP/1.1 1590 [main] DEBUG org.apache.http.wire - >> "HEAD /github_projects_datasampler/github_projects_t_000000_0.data HTTP/1.1[\r][\n]" 1590 [main] DEBUG org.apache.http.wire - >> "Date: Mon, 22 Feb 2016 13:07:38 GMT[\r][\n]" 1590 [main] DEBUG org.apache.http.wire - >> "Authorization: AWS s6XZuf8L6MionO98:jlZMZrBTK0HpFBdZhitLV8m7EtA=[\r][\n]" 1590 [main] DEBUG org.apache.http.wire - >> "Host: s3.us.archive.org:80[\r][\n]" 1590 [main] DEBUG org.apache.http.wire - >> "Connection: Keep-Alive[\r][\n]" 1590 [main] DEBUG org.apache.http.wire - >> "User-Agent: JetS3t/0.9.0 (Linux/3.16.0-4-amd64; amd64; en; JVM 1.7.0_91)[\r][\n]" 1590 [main] DEBUG org.apache.http.wire - >> "[\r][\n]" 1590 [main] DEBUG org.apache.http.headers - >> HEAD /github_projects_datasampler/github_projects_t_000000_0.data HTTP/1.1 1590 [main] DEBUG org.apache.http.headers - >> Date: Mon, 22 Feb 2016 13:07:38 GMT 1590 [main] DEBUG org.apache.http.headers - >> Authorization: AWS s6XZuf8L6MionO98:jlZMZrBTK0HpFBdZhitLV8m7EtA= 1590 [main] DEBUG org.apache.http.headers - >> Host: s3.us.archive.org:80 1590 [main] DEBUG org.apache.http.headers - >> Connection: Keep-Alive 1590 [main] DEBUG org.apache.http.headers - >> User-Agent: JetS3t/0.9.0 (Linux/3.16.0-4-amd64; amd64; en; JVM 1.7.091) 1888 [main] DEBUG org.apache.http.wire - << "HTTP/1.1 307 Temporary Redirect[\r][\n]" 1888 [main] DEBUG org.apache.http.wire - << "Date: Mon, 22 Feb 2016 13:07:39 GMT[\r][\n]" 1889 [main] DEBUG org.apache.http.wire - << "Server: Apache/2.4.7 (Ubuntu)[\r][\n]" 1889 [main] DEBUG org.apache.http.wire - << "Accept-Ranges: bytes[\r][\n]" 1889 [main] DEBUG org.apache.http.wire - << "Access-Control-Allow-Origin: [\r][\n]" 1889 [main] DEBUG org.apache.http.wire - << "Access-Control-Allow-Methods: GET,POST,PUT,DELETE[\r][\n]" 1889 [main] DEBUG org.apache.http.wire - << "Access-Control-Allow-Headers: authorization,x-amz-acl,x-amz-auto-make-bucket,cache-control,x-requested-with,x-file-name,x-file-size,x-archive-ignore-preexisting-bucket,x-archive-interactive-priority,x-archive-meta-title,x-archive-meta-description,x-archive-meta-language,x-archive-meta-mediatype,x-archive-meta01-subject,x-archive-meta02-subject,x-archive-meta03-subject,x-archive-meta04-subject,x-archive-meta05-subject,x-archive-meta01-collection,x-archive-meta02-collection[\r][\n]" 1889 [main] DEBUG org.apache.http.wire - << "location: http://ia801504.s3dns.us.archive.org:80/github_projects_data_sampler/___github_projects_t_000000_0.data[\r][\n]" 1889 [main] DEBUG org.apache.http.wire - << "Content-Length: 407[\r][\n]" 1889 [main] DEBUG org.apache.http.wire - << "Connection: close[\r][\n]" 1889 [main] DEBUG org.apache.http.wire - << "Content-Type: application/xml[\r][\n]" 1889 [main] DEBUG org.apache.http.wire - << "[\r][\n]" 1889 [main] DEBUG org.apache.http.impl.conn.DefaultClientConnection - Receiving response: HTTP/1.1 307 Temporary Redirect 1889 [main] DEBUG org.apache.http.headers - << HTTP/1.1 307 Temporary Redirect 1889 [main] DEBUG org.apache.http.headers - << Date: Mon, 22 Feb 2016 13:07:39 GMT 1889 [main] DEBUG org.apache.http.headers - << Server: Apache/2.4.7 (Ubuntu) 1889 [main] DEBUG org.apache.http.headers - << Accept-Ranges: bytes 1889 [main] DEBUG org.apache.http.headers - << Access-Control-Allow-Origin: 1889 [main] DEBUG org.apache.http.headers - << Access-Control-Allow-Methods: GET,POST,PUT,DELETE 1890 [main] DEBUG org.apache.http.headers - << Access-Control-Allow-Headers: authorization,x-amz-acl,x-amz-auto-make-bucket,cache-control,x-requested-with,x-file-name,x-file-size,x-archive-ignore-preexisting-bucket,x-archive-interactive-priority,x-archive-meta-title,x-archive-meta-description,x-archive-meta-language,x-archive-meta-mediatype,x-archive-meta01-subject,x-archive-meta02-subject,x-archive-meta03-subject,x-archive-meta04-subject,x-archive-meta05-subject,x-archive-meta01-collection,x-archive-meta02-collection 1890 [main] DEBUG org.apache.http.headers - << location: http://ia801504.s3dns.us.archive.org:80/github_projects_data_sampler/___github_projects_t_000000_0.data 1890 [main] DEBUG org.apache.http.headers - << Content-Length: 407 1890 [main] DEBUG org.apache.http.headers - << Connection: close 1890 [main] DEBUG org.apache.http.headers - << Content-Type: application/xml 1890 [main] DEBUG org.apache.http.impl.client.DefaultRedirectStrategy - Redirect requested to location 'http://ia801504.s3dns.us.archive.org:80/github_projects_data_sampler/___github_projects_t_000000_0.data' 1890 [main] DEBUG org.apache.http.impl.client.DefaultHttpClient - Redirecting to 'http://ia801504.s3dns.us.archive.org:80/github_projects_data_sampler/___github_projects_t_000000_0.data' via HttpRoute[{}->http://ia801504.s3dns.us.archive.org:80] 1890 [main] DEBUG org.apache.http.impl.conn.DefaultClientConnection - Connection closed 1890 [main] DEBUG org.jets3t.service.utils.RestUtils$ThreadSafeConnManager - Released connection is not reusable. 1890 [main] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - Releasing connection [HttpRoute[{}->http://s3.us.archive.org:80]][null] 1890 [main] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - Notifying no-one, there are no waiting threads 1890 [main] DEBUG org.jets3t.service.utils.RestUtils$ThreadSafeConnManager - Get connection: HttpRoute[{}->http://ia801504.s3dns.us.archive.org:80], timeout = 0 1890 [main] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - [HttpRoute[{}->http://ia801504.s3dns.us.archive.org:80]] total kept alive: 1, total issued: 0, total allocated: 1 out of 20 1890 [main] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - Getting free connection [HttpRoute[{}->http://ia801504.s3dns.us.archive.org:80]][null] 1890 [main] DEBUG org.apache.http.impl.client.DefaultHttpClient - Stale connection check 1892 [main] DEBUG org.apache.http.client.protocol.RequestAddCookies - CookieSpec selected: best-match 1892 [main] DEBUG org.apache.http.client.protocol.RequestAuthCache - Auth cache not set in the context 1892 [main] DEBUG org.apache.http.impl.client.DefaultHttpClient - Attempt 2 to execute request 1892 [main] DEBUG org.apache.http.impl.conn.DefaultClientConnection - Sending request: HEAD /github_projects_data_sampler/_github_projects_t_000000_0.data HTTP/1.1 1892 [main] DEBUG org.apache.http.wire - >> "HEAD /github_projects_data_sampler/___github_projects_t_000000_0.data HTTP/1.1[\r][\n]" 1892 [main] DEBUG org.apache.http.wire - >> "Date: Mon, 22 Feb 2016 13:07:38 GMT[\r][\n]" 1892 [main] DEBUG org.apache.http.wire - >> "Authorization: AWS s6XZuf8L6MionO98:jlZMZrBTK0HpFBdZhitLV8m7EtA=[\r][\n]" 1892 [main] DEBUG org.apache.http.wire - >> "Host: ia801504.s3dns.us.archive.org:80[\r][\n]" 1892 [main] DEBUG org.apache.http.wire - >> "Connection: Keep-Alive[\r][\n]" 1892 [main] DEBUG org.apache.http.wire - >> "User-Agent: JetS3t/0.9.0 (Linux/3.16.0-4-amd64; amd64; en; JVM 1.7.0_91)[\r][\n]" 1892 [main] DEBUG org.apache.http.wire - >> "[\r][\n]" 1893 [main] DEBUG org.apache.http.headers - >> HEAD /github_projects_datasampler/github_projects_t_000000_0.data HTTP/1.1 1893 [main] DEBUG org.apache.http.headers - >> Date: Mon, 22 Feb 2016 13:07:38 GMT 1893 [main] DEBUG org.apache.http.headers - >> Authorization: AWS s6XZuf8L6MionO98:jlZMZrBTK0HpFBdZhitLV8m7EtA= 1893 [main] DEBUG org.apache.http.headers - >> Host: ia801504.s3dns.us.archive.org:80 1893 [main] DEBUG org.apache.http.headers - >> Connection: Keep-Alive 1893 [main] DEBUG org.apache.http.headers - >> User-Agent: JetS3t/0.9.0 (Linux/3.16.0-4-amd64; amd64; en; JVM 1.7.0_91) 2006 [main] DEBUG org.apache.http.wire - << "HTTP/1.1 200 OK[\r][\n]" 2006 [main] DEBUG org.apache.http.wire - << "Server: nginx/1.4.6 (Ubuntu)[\r][\n]" 2006 [main] DEBUG org.apache.http.wire - << "Date: Mon, 22 Feb 2016 13:07:39 GMT[\r][\n]" 2006 [main] DEBUG org.apache.http.wire - << "Content-Type: multipart/form-data; charset=UTF-8[\r][\n]" 2006 [main] DEBUG org.apache.http.wire - << "Content-Length: 16943171[\r][\n]" 2006 [main] DEBUG org.apache.http.wire - << "Connection: keep-alive[\r][\n]" 2006 [main] DEBUG org.apache.http.wire - << "Accept-Ranges: bytes[\r][\n]" 2006 [main] DEBUG org.apache.http.wire - << "x-archive-interactive-priority: 1[\r][\n]" 2006 [main] DEBUG org.apache.http.wire - << "x-archive-meta01-description: uri(data%20samples%20gz%20compressed%20and%20uncompressed%20from%20different%20tables.%26nbsp%3B%3Cdiv%3E%3Cbr%3E%3C%2Fdiv%3E%3Cdiv%3Esee%20allfiles.txt%26nbsp%3B%3C%2Fdiv%3E%3Cdiv%3Etable_extended.txt%3C%2Fdiv%3E%3Cdiv%3Edescribe.txt%3C%2Fdiv%3E%3Cdiv%3Eand%20tables2.txt%20for%20the%20table%20descriptions.%3C%2Fdiv%3E)[\r][\n]" 2006 [main] DEBUG org.apache.http.wire - << "x-ias3-encoded-key: ___github_projects_t_000000_0.data[\r][\n]" 2006 [main] DEBUG org.apache.http.wire - << "x-archive-meta03-subject: uri(hadoop)[\r][\n]" 2007 [main] DEBUG org.apache.http.wire - << "x-archive-meta01-scanner: uri(Internet%20Archive%20HTML5%20Uploader%201.6.3)[\r][\n]" 2007 [main] DEBUG org.apache.http.wire - << "access-control-allow-methods: GET,POST,PUT,DELETE[\r][\n]" 2007 [main] DEBUG org.apache.http.wire - << "x-archive-queue-derive: 0[\r][\n]" 2007 [main] DEBUG org.apache.http.wire - << "x-amz-auto-make-bucket: 1[\r][\n]" 2007 [main] DEBUG org.apache.http.wire - << "ETag: "8f211deaec6266d5c8f63f45f0ff784b"[\r][\n]" 2007 [main] DEBUG org.apache.http.wire - << "x-file-size: 16943171[\r][\n]" 2007 [main] DEBUG org.apache.http.wire - << "x-archive-meta02-subject: uri(hive)[\r][\n]" 2007 [main] DEBUG org.apache.http.wire - << "access-control-allow-headers: authorization,x-amz-acl,x-amz-auto-make-bucket,cache-control,x-requested-with,x-file-name,x-file-size,x-archive-ignore-preexisting-bucket,x-archive-interactive-priority,x-archive-meta-title,x-archive-meta-description,x-archive-meta-language,x-archive-meta-mediatype,x-archive-meta01-subject,x-archive-meta02-subject,x-archive-meta03-subject,x-archive-meta04-subject,x-archive-meta05-subject,x-archive-meta01-collection,x-archive-meta02-collection[\r][\n]" 2007 [main] DEBUG org.apache.http.wire - << "x-archive-meta01-title: uri(github_projects_data_sampler)[\r][\n]" 2007 [main] DEBUG org.apache.http.wire - << "x-archive-size-hint: 257094368[\r][\n]" 2007 [main] DEBUG org.apache.http.wire - << "last-modified: Sat, 20 Feb 2016 15:17:44 GMT[\r][\n]" 2007 [main] DEBUG org.apache.http.wire - << "x-archive-meta01-collection: uri(opensource_media)[\r][\n]" 2007 [main] DEBUG org.apache.http.wire - << "access-control-allow-origin: [\r][\n]" 2007 [main] DEBUG org.apache.http.wire - << "x-amz-acl: bucket-owner-full-control[\r][\n]" 2007 [main] DEBUG org.apache.http.wire - << "x-upload-date: 2016-02-20T15:17:44.000Z[\r][\n]" 2007 [main] DEBUG org.apache.http.wire - << "x-archive-meta04-subject: uri(github)[\r][\n]" 2008 [main] DEBUG org.apache.http.wire - << "x-archive-meta01-subject: uri(oel)[\r][\n]" 2008 [main] DEBUG org.apache.http.wire - << "x-requested-with: XMLHttpRequest[\r][\n]" 2008 [main] DEBUG org.apache.http.wire - << "x-file-name: uri(_github_projects_t_000000_0.data)[\r][\n]" 2008 [main] DEBUG org.apache.http.wire - << "x-archive-meta-mediatype: uri(texts)[\r][\n]" 2008 [main] DEBUG org.apache.http.wire - << "Expires: Mon, 22 Feb 2016 19:07:39 GMT[\r][\n]" 2008 [main] DEBUG org.apache.http.wire - << "Cache-Control: max-age=21600[\r][\n]" 2008 [main] DEBUG org.apache.http.wire - << "[\r][\n]" 2008 [main] DEBUG org.apache.http.impl.conn.DefaultClientConnection - Receiving response: HTTP/1.1 200 OK 2008 [main] DEBUG org.apache.http.headers - << HTTP/1.1 200 OK 2008 [main] DEBUG org.apache.http.headers - << Server: nginx/1.4.6 (Ubuntu) 2008 [main] DEBUG org.apache.http.headers - << Date: Mon, 22 Feb 2016 13:07:39 GMT 2008 [main] DEBUG org.apache.http.headers - << Content-Type: multipart/form-data; charset=UTF-8 2008 [main] DEBUG org.apache.http.headers - << Content-Length: 16943171 2008 [main] DEBUG org.apache.http.headers - << Connection: keep-alive 2008 [main] DEBUG org.apache.http.headers - << Accept-Ranges: bytes 2008 [main] DEBUG org.apache.http.headers - << x-archive-interactive-priority: 1 2008 [main] DEBUG org.apache.http.headers - << x-archive-meta01-description: uri(data%20samples%20gz%20compressed%20and%20uncompressed%20from%20different%20tables.%26nbsp%3B%3Cdiv%3E%3Cbr%3E%3C%2Fdiv%3E%3Cdiv%3Esee%20allfiles.txt%26nbsp%3B%3C%2Fdiv%3E%3Cdiv%3Etable_extended.txt%3C%2Fdiv%3E%3Cdiv%3Edescribe.txt%3C%2Fdiv%3E%3Cdiv%3Eand%20tables2.txt%20for%20the%20table%20descriptions.%3C%2Fdiv%3E) 2008 [main] DEBUG org.apache.http.headers - << x-ias3-encoded-key: ___github_projects_t_000000_0.data 2008 [main] DEBUG org.apache.http.headers - << x-archive-meta03-subject: uri(hadoop) 2008 [main] DEBUG org.apache.http.headers - << x-archive-meta01-scanner: uri(Internet%20Archive%20HTML5%20Uploader%201.6.3) 2008 [main] DEBUG org.apache.http.headers - << access-control-allow-methods: GET,POST,PUT,DELETE 2008 [main] DEBUG org.apache.http.headers - << x-archive-queue-derive: 0 2008 [main] DEBUG org.apache.http.headers - << x-amz-auto-make-bucket: 1 2008 [main] DEBUG org.apache.http.headers - << ETag: "8f211deaec6266d5c8f63f45f0ff784b" 2009 [main] DEBUG org.apache.http.headers - << x-file-size: 16943171 2009 [main] DEBUG org.apache.http.headers - << x-archive-meta02-subject: uri(hive) 2009 [main] DEBUG org.apache.http.headers - << access-control-allow-headers: authorization,x-amz-acl,x-amz-auto-make-bucket,cache-control,x-requested-with,x-file-name,x-file-size,x-archive-ignore-preexisting-bucket,x-archive-interactive-priority,x-archive-meta-title,x-archive-meta-description,x-archive-meta-language,x-archive-meta-mediatype,x-archive-meta01-subject,x-archive-meta02-subject,x-archive-meta03-subject,x-archive-meta04-subject,x-archive-meta05-subject,x-archive-meta01-collection,x-archive-meta02-collection 2009 [main] DEBUG org.apache.http.headers - << x-archive-meta01-title: uri(github_projects_data_sampler) 2009 [main] DEBUG org.apache.http.headers - << x-archive-size-hint: 257094368 2009 [main] DEBUG org.apache.http.headers - << last-modified: Sat, 20 Feb 2016 15:17:44 GMT 2009 [main] DEBUG org.apache.http.headers - << x-archive-meta01-collection: uri(opensourcemedia) 2009 [main] DEBUG org.apache.http.headers - << access-control-allow-origin: * 2009 [main] DEBUG org.apache.http.headers - << x-amz-acl: bucket-owner-full-control 2009 [main] DEBUG org.apache.http.headers - << x-upload-date: 2016-02-20T15:17:44.000Z 2009 [main] DEBUG org.apache.http.headers - << x-archive-meta04-subject: uri(github) 2009 [main] DEBUG org.apache.http.headers - << x-archive-meta01-subject: uri(oel) 2009 [main] DEBUG org.apache.http.headers - << x-requested-with: XMLHttpRequest 2009 [main] DEBUG org.apache.http.headers - << x-file-name: uri(github_projects_t_000000_0.data) 2009 [main] DEBUG org.apache.http.headers - << x-archive-meta-mediatype: uri(texts) 2009 [main] DEBUG org.apache.http.headers - << Expires: Mon, 22 Feb 2016 19:07:39 GMT 2009 [main] DEBUG org.apache.http.headers - << Cache-Control: max-age=21600 2009 [main] DEBUG org.apache.http.impl.client.DefaultHttpClient - Connection can be kept alive indefinitely 2009 [main] DEBUG org.jets3t.service.utils.RestUtils$ThreadSafeConnManager - Released connection is reusable. 2009 [main] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - Releasing connection [HttpRoute[{}->http://ia801504.s3dns.us.archive.org:80]][null] 2009 [main] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - Pooling connection [HttpRoute[{}->http://ia801504.s3dns.us.archive.org:80]][null]; keep alive indefinitely 2009 [main] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute - Notifying no-one, there are no waiting threads 2009 [main] DEBUG org.jets3t.service.impl.rest.httpclient.RestStorageService - Response for 'HEAD'. Content-Type: multipart/form-data; charset=UTF-8, Headers: [Server: nginx/1.4.6 (Ubuntu), Date: Mon, 22 Feb 2016 13:07:39 GMT, Content-Type: multipart/form-data; charset=UTF-8, Content-Length: 16943171, Connection: keep-alive, Accept-Ranges: bytes, x-archive-interactive-priority: 1, x-archive-meta01-description: uri(data%20samples%20gz%20compressed%20and%20uncompressed%20from%20different%20tables.%26nbsp%3B%3Cdiv%3E%3Cbr%3E%3C%2Fdiv%3E%3Cdiv%3Esee%20allfiles.txt%26nbsp%3B%3C%2Fdiv%3E%3Cdiv%3Etable_extended.txt%3C%2Fdiv%3E%3Cdiv%3Edescribe.txt%3C%2Fdiv%3E%3Cdiv%3Eand%20tables2.txt%20for%20the%20table%20descriptions.%3C%2Fdiv%3E), x-ias3-encoded-key: _github_projects_t_000000_0.data, x-archive-meta03-subject: uri(hadoop), x-archive-meta01-scanner: uri(Internet%20Archive%20HTML5%20Uploader%201.6.3), access-control-allow-methods: GET,POST,PUT,DELETE, x-archive-queue-derive: 0, x-amz-auto-make-bucket: 1, ETag: "8f211deaec6266d5c8f63f45f0ff784b", x-file-size: 16943171, x-archive-meta02-subject: uri(hive), access-control-allow-headers: authorization,x-amz-acl,x-amz-auto-make-bucket,cache-control,x-requested-with,x-file-name,x-file-size,x-archive-ignore-preexisting-bucket,x-archive-interactive-priority,x-archive-meta-title,x-archive-meta-description,x-archive-meta-language,x-archive-meta-mediatype,x-archive-meta01-subject,x-archive-meta02-subject,x-archive-meta03-subject,x-archive-meta04-subject,x-archive-meta05-subject,x-archive-meta01-collection,x-archive-meta02-collection, x-archive-meta01-title: uri(github_projects_data_sampler), x-archive-size-hint: 257094368, last-modified: Sat, 20 Feb 2016 15:17:44 GMT, x-archive-meta01-collection: uri(opensourcemedia), access-control-allow-origin: *, x-amz-acl: bucket-owner-full-control, x-upload-date: 2016-02-20T15:17:44.000Z, x-archive-meta04-subject: uri(github), x-archive-meta01-subject: uri(oel), x-requested-with: XMLHttpRequest, x-file-name: uri(github_projects_t_000000_0.data), x-archive-meta-mediatype: uri(texts), Expires: Mon, 22 Feb 2016 19:07:39 GMT, Cache-Control: max-age=21600] 2010 [main] DEBUG org.jets3t.service.impl.rest.httpclient.RestStorageService - Response entity: null 2010 [main] DEBUG org.jets3t.service.impl.rest.httpclient.RestStorageService - Received expected response code: true 2010 [main] DEBUG org.jets3t.service.impl.rest.httpclient.RestStorageService - expected code(s): [200]. 2010 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Cleaning up REST metadata items 2010 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-file-name=uri(_github_projects_t_0000000.data) 2010 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Removed header prefix x-amz- from key: x-amz-acl=>acl 2010 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-archive-size-hint=257094368 2010 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-ias3-encoded-key=github_projects_t_000000_0.data 2010 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Leaving HTTP header item unchanged: Content-Length=16943171 2010 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-archive-meta01-scanner=uri(Internet%20Archive%20HTML5%20Uploader%201.6.3) 2010 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-archive-meta01-collection=uri(opensource_media) 2010 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: Connection=keep-alive 2010 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Leaving HTTP header item unchanged: Cache-Control=max-age=21600 2010 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-archive-meta01-title=uri(github_projects_data_sampler) 2010 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Leaving header item unchanged: Date=Mon, 22 Feb 2016 13:07:39 GMT 2010 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Parsing date string 'Mon, 22 Feb 2016 13:07:39 GMT' into Date object for key: Date 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-archive-queue-derive=0 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-archive-meta04-subject=uri(github) 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-requested-with=XMLHttpRequest 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-archive-interactive-priority=1 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-archive-meta01-description=uri(data%20samples%20gz%20compressed%20and%20uncompressed%20from%20different%20tables.%26nbsp%3B%3Cdiv%3E%3Cbr%3E%3C%2Fdiv%3E%3Cdiv%3Esee%20allfiles.txt%26nbsp%3B%3C%2Fdiv%3E%3Cdiv%3Etableextended.txt%3C%2Fdiv%3E%3Cdiv%3Edescribe.txt%3C%2Fdiv%3E%3Cdiv%3Eand%20tables2.txt%20for%20the%20table%20descriptions.%3C%2Fdiv%3E) 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: access-control-allow-headers=authorization,x-amz-acl,x-amz-auto-make-bucket,cache-control,x-requested-with,x-file-name,x-file-size,x-archive-ignore-preexisting-bucket,x-archive-interactive-priority,x-archive-meta-title,x-archive-meta-description,x-archive-meta-language,x-archive-meta-mediatype,x-archive-meta01-subject,x-archive-meta02-subject,x-archive-meta03-subject,x-archive-meta04-subject,x-archive-meta05-subject,x-archive-meta01-collection,x-archive-meta02-collection 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Leaving header item unchanged: ETag="8f211deaec6266d5c8f63f45f0ff784b" 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-archive-meta01-subject=uri(oel) 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: access-control-allow-origin= 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Leaving header item unchanged: last-modified=Sat, 20 Feb 2016 15:17:44 GMT 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-file-size=16943171 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Leaving HTTP header item unchanged: Expires=Mon, 22 Feb 2016 19:07:39 GMT 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: Server=nginx/1.4.6 (Ubuntu) 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-archive-meta03-subject=uri(hadoop) 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-archive-meta02-subject=uri(hive) 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Removed header prefix x-amz- from key: x-amz-auto-make-bucket=>auto-make-bucket 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-upload-date=2016-02-20T15:17:44.000Z 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: x-archive-meta-mediatype=uri(texts) 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Leaving HTTP header item unchanged: Content-Type=multipart/form-data; charset=UTF-8 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: Accept-Ranges=bytes 2011 [main] DEBUG org.jets3t.service.utils.ServiceUtils - Ignoring metadata item: access-control-allow-methods=GET,POST,PUT,DELETE 2011 [main] DEBUG org.jets3t.service.impl.rest.httpclient.RestStorageService - Releasing HttpMethod after HEAD Exception in thread "main" java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Date at org.jets3t.service.model.StorageObject.getLastModifiedDate(StorageObject.java:376) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:176) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:255) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at org.apache.hadoop.fs.s3native.$Proxy0.retrieveMetadata(Unknown Source) at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:476) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2062) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2031) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2007) at xyz.introspector.hadoop.archive_org.archive_org_bucket_fs.App.main(App.java:61)

h4ck3rm1k3 commented 8 years ago

bugfix for jets3t https://bitbucket.org/jmurty/jets3t/pull-requests/38/archiveorg-support/diff