qubole / streamx

kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Apache License 2.0
97 stars 54 forks source link

java.lang.NoSuchFieldError: INSTANCE exception, caused by http client version mismatch #32

Open zzbennett opened 7 years ago

zzbennett commented 7 years ago

I'm trying to get the s3 connector working but I keep running into this exception when I use the S3AFileSystem (I would use NativeS3FileSystem but for reasons I need S3a).

[2017-01-24 01:50:41,622] INFO Couldn't start HdfsSinkConnector: (io.confluent.connect.hdfs.HdfsSinkTask:73)
org.apache.kafka.connect.errors.ConnectException: java.lang.reflect.InvocationTargetException
    at io.confluent.connect.hdfs.storage.StorageFactory.createStorage(StorageFactory.java:40)
    at io.confluent.connect.hdfs.DataWriter.<init>(DataWriter.java:171)
    at io.confluent.connect.hdfs.HdfsSinkTask.start(HdfsSinkTask.java:65)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:221)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:140)
    at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
    at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at io.confluent.connect.hdfs.storage.StorageFactory.createStorage(StorageFactory.java:33)
    ... 11 more
Caused by: java.lang.NoSuchFieldError: INSTANCE
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:144)
    at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.getPreferredSocketFactory(ApacheConnectionManagerFactory.java:87)
    at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:65)
    at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:58)
    at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:51)
    at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:39)
    at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:319)
    at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:303)
    at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:164)
    at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:564)
    at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:544)
    at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:526)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:235)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
    at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:2675)
    at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:418)
    at com.qubole.streamx.s3.S3Storage.<init>(S3Storage.java:49)
    ... 16 more

Some cursory googling is indicating that it is related to a version conflict in the apache http core library. I see in /home/ec2-user/streamx/target/streamx-0.1.0-SNAPSHOT-development/share/java/streamx/* there is the jar httpcore-4.2.4.jar. In /usr/bin/../share/java/kafka, which is also on the classpath, there is the jar httpcore-4.4.3.jar. I can try to take a stab at fixing this, but I figured I'd file an issue in case it is a known issue and/or if there is an established work around.

zzbennett commented 7 years ago

This is actually an issue with the httpclient version. After sleuthing around the classpath and maven dependency tree, it appears that the aws-java-sdk-s3 dependency, which in streamx is currently set at 1.11.69, pulls in httpclient version 4.5.1. It seems aws-java-sdk-s3 actually needs to be downgraded? I'm actually not sure how this is working for other folks. Downgrading aws-java-sdk-s3 to version 1.10.77 pulls in httpclient version 4.3.6 which appears to solve the java.lang.NoSuchFieldError: INSTANCE error, however, a new error appears:

Caused by: java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
    at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:2675)
    at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:418)
    at com.qubole.streamx.s3.S3Storage.<init>(S3Storage.java:49)
    ... 16 more

This apparently is a known issue related to an incompatibility with hadoop version 2.7 and aws-java-sdk >1.7. After trying a few different versions of aws-java-sdk-s3, I ended up just deleting the dependency entirely which resolved the issue.

PraveenSeluka commented 7 years ago

Hi @zzbennett Sorry to respond late. Yes, we have found multiple issues with S3A (thread leak and httpClient related issues). So far, the experience with using NativeS3FileSystem is very stable. Can you try that out instead ?

zzbennett commented 7 years ago

Thanks for your reply @PraveenSeluka. I'm not able to use NativeS3FileSystem because it doesn't support aws's temporary security credentials, which is what I'm using. There is a ticket open in the hadoop community to add support for temporary security credentials but they have decided not to implement it as s3a already supports it, and (according to the third comment on this thread) they are not planning on making any more enhancements to the s3n connector. So, sadly, s3n will never support temporary security tokens, but I cannot get s3a to work with streamx. The dependency issue appears to have resolved when I deleted the aws-java-sdk-s3 dependency though, so I'm unblocked on that issue for now. I'm still not able to connect to S3 due to an access denied 403 error, so hopefully once that is resolve, things will start working.

PraveenSeluka commented 7 years ago

@zzbennett You are right. They are not going to add the Roles (temp creds) support in S3N and S3A is the way forward. I will look into this issue and get back soon.

zzbennett commented 7 years ago

Regarding the S3 403 error, I resolved that by deleting the access_key and secret_key configs from the hadoop hdfs-site.xml config file. Streamx seems to be working smoothly now. Really the only thing I ended up doing was deleting the aws-java-sdk-s3 dependency from the streamx pom.xml file.

PraveenSeluka commented 7 years ago

Yeah right, you need to remove the keys (or it wont use roles and the keys are invalid). I will add a note for this.

PraveenSeluka commented 7 years ago

@zzbennett Please look at https://github.com/qubole/streamx/issues/30 for issues related to S3A.