scylladb / scylla-migrator

Migrate data extract using Spark to Scylla, normally from Cassandra
Apache License 2.0
54 stars 34 forks source link

Connection refused error #131

Closed anand-chandrashekar closed 2 months ago

anand-chandrashekar commented 2 months ago

I tried to run migrator as per the config below. Spark version 2.4.4. OS: ubuntu (AWS)

source:
  type: dynamodb
  table: anand-mig-1
  endpoint:
    host: dynamodb.us-east-1.amazonaws.com
    port: 8000
  credentials:
    accessKey: abc
    secretKey: xxx
  maxMapTasks: 1

target:
  type: dynamodb
  table: anand-mig-1
  endpoint:
    host: xxx
    port: 8000
  credentials:
    accessKey: none
    secretKey: none
  maxMapTasks: 1
  streamChanges: false

renames: []

# Below are unused but mandatory settings
savepoints:
  path: /app/savepoints
  intervalSeconds: 300
skipTokenRanges: []
validation:
  compareTimestamps: true
  ttlToleranceMillis: 60000
  writetimeToleranceMillis: 1000
  failuresToFetch: 100
  floatingPointTolerance: 0.001
  timestampMsTolerance: 0

Error message:

ubuntu@ip-10-0-0-129:~/install/spark/spark-2.4.4-bin-hadoop2.7/bin$ ./spark-submit --class com.scylladb.migrator.Migrator --master spark://ip-10-0-0-129.ec2.internal:7077 --conf spark.driver.host=ip-10-0-0-129.ec2.internal --conf spark.scylla.config=/home/ubuntu/altws/dynamodb-to-alternator-basic.yaml /home/ubuntu/install/migrator/scylla-migrator/migrator/target/scala-2.11/scylla-migrator-assembly-0.0.1.jar
24/04/29 10:52:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
24/04/29 10:52:24 INFO SparkContext: Running Spark version 2.4.4
24/04/29 10:52:24 INFO SparkContext: Submitted application: scylla-migrator
24/04/29 10:52:24 INFO SecurityManager: Changing view acls to: ubuntu
24/04/29 10:52:24 INFO SecurityManager: Changing modify acls to: ubuntu
24/04/29 10:52:24 INFO SecurityManager: Changing view acls groups to:
24/04/29 10:52:24 INFO SecurityManager: Changing modify acls groups to:
24/04/29 10:52:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ubuntu); groups with view permissions: Set(); users  with modify permissions: Set(ubuntu); groups with modify permissions: Set()
24/04/29 10:52:24 INFO Utils: Successfully started service 'sparkDriver' on port 41351.
24/04/29 10:52:24 INFO SparkEnv: Registering MapOutputTracker
24/04/29 10:52:24 INFO SparkEnv: Registering BlockManagerMaster
24/04/29 10:52:24 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
24/04/29 10:52:24 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
24/04/29 10:52:24 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-df0c3f25-bc98-4cf9-baa1-8b61e406912f
24/04/29 10:52:24 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
24/04/29 10:52:24 INFO SparkEnv: Registering OutputCommitCoordinator
24/04/29 10:52:24 INFO Utils: Successfully started service 'SparkUI' on port 4040.
24/04/29 10:52:24 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://ip-10-0-0-129.ec2.internal:4040
24/04/29 10:52:24 INFO SparkContext: Added JAR file:/home/ubuntu/install/migrator/scylla-migrator/migrator/target/scala-2.11/scylla-migrator-assembly-0.0.1.jar at spark://ip-10-0-0-129.ec2.internal:41351/jars/scylla-migrator-assembly-0.0.1.jar with timestamp 1714387944952
24/04/29 10:52:24 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://ip-10-0-0-129.ec2.internal:7077...
24/04/29 10:52:25 INFO TransportClientFactory: Successfully created connection to ip-10-0-0-129.ec2.internal/10.0.0.129:7077 after 19 ms (0 ms spent in bootstraps)
24/04/29 10:52:25 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20240429105225-0017
24/04/29 10:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20240429105225-0017/0 on worker-20240428032644-10.0.0.129-45665 (10.0.0.129:45665) with 2 core(s)
24/04/29 10:52:25 INFO StandaloneSchedulerBackend: Granted executor ID app-20240429105225-0017/0 on hostPort 10.0.0.129:45665 with 2 core(s), 1024.0 MB RAM
24/04/29 10:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20240429105225-0017/1 on worker-20240428032659-10.0.0.129-42885 (10.0.0.129:42885) with 2 core(s)
24/04/29 10:52:25 INFO StandaloneSchedulerBackend: Granted executor ID app-20240429105225-0017/1 on hostPort 10.0.0.129:42885 with 2 core(s), 1024.0 MB RAM
24/04/29 10:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20240429105225-0017/2 on worker-20240428032649-10.0.0.129-36371 (10.0.0.129:36371) with 2 core(s)
24/04/29 10:52:25 INFO StandaloneSchedulerBackend: Granted executor ID app-20240429105225-0017/2 on hostPort 10.0.0.129:36371 with 2 core(s), 1024.0 MB RAM
24/04/29 10:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20240429105225-0017/3 on worker-20240428032656-10.0.0.129-46715 (10.0.0.129:46715) with 2 core(s)
24/04/29 10:52:25 INFO StandaloneSchedulerBackend: Granted executor ID app-20240429105225-0017/3 on hostPort 10.0.0.129:46715 with 2 core(s), 1024.0 MB RAM
24/04/29 10:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20240429105225-0017/4 on worker-20240428032646-10.0.0.129-36563 (10.0.0.129:36563) with 2 core(s)
24/04/29 10:52:25 INFO StandaloneSchedulerBackend: Granted executor ID app-20240429105225-0017/4 on hostPort 10.0.0.129:36563 with 2 core(s), 1024.0 MB RAM
24/04/29 10:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20240429105225-0017/5 on worker-20240428032651-10.0.0.129-44873 (10.0.0.129:44873) with 2 core(s)
24/04/29 10:52:25 INFO StandaloneSchedulerBackend: Granted executor ID app-20240429105225-0017/5 on hostPort 10.0.0.129:44873 with 2 core(s), 1024.0 MB RAM
24/04/29 10:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20240429105225-0017/6 on worker-20240428032702-10.0.0.129-38949 (10.0.0.129:38949) with 2 core(s)
24/04/29 10:52:25 INFO StandaloneSchedulerBackend: Granted executor ID app-20240429105225-0017/6 on hostPort 10.0.0.129:38949 with 2 core(s), 1024.0 MB RAM
24/04/29 10:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20240429105225-0017/7 on worker-20240428032654-10.0.0.129-43929 (10.0.0.129:43929) with 2 core(s)
24/04/29 10:52:25 INFO StandaloneSchedulerBackend: Granted executor ID app-20240429105225-0017/7 on hostPort 10.0.0.129:43929 with 2 core(s), 1024.0 MB RAM
24/04/29 10:52:25 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36277.
24/04/29 10:52:25 INFO NettyBlockTransferService: Server created on ip-10-0-0-129.ec2.internal:36277
24/04/29 10:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20240429105225-0017/1 is now RUNNING
24/04/29 10:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20240429105225-0017/6 is now RUNNING
24/04/29 10:52:25 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
24/04/29 10:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20240429105225-0017/2 is now RUNNING
24/04/29 10:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20240429105225-0017/5 is now RUNNING
24/04/29 10:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20240429105225-0017/0 is now RUNNING
24/04/29 10:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20240429105225-0017/7 is now RUNNING
24/04/29 10:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20240429105225-0017/4 is now RUNNING
24/04/29 10:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20240429105225-0017/3 is now RUNNING
24/04/29 10:52:25 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, ip-10-0-0-129.ec2.internal, 36277, None)
24/04/29 10:52:25 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-0-0-129.ec2.internal:36277 with 366.3 MB RAM, BlockManagerId(driver, ip-10-0-0-129.ec2.internal, 36277, None)
24/04/29 10:52:25 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, ip-10-0-0-129.ec2.internal, 36277, None)
24/04/29 10:52:25 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, ip-10-0-0-129.ec2.internal, 36277, None)
24/04/29 10:52:25 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
24/04/29 10:52:26 INFO migrator: Loaded config: MigratorConfig(DynamoDB(Some(DynamoDBEndpoint(dynamodb.us-east-1.amazonaws.com,8000)),None,Some(AWSCredentials(ASI..., <redacted>)),anand-mig-1,None,None,None,Some(1)),DynamoDB(Some(DynamoDBEndpoint(35.227.81.47,8000)),None,Some(AWSCredentials(non..., <redacted>)),anand-mig-1,None,None,None,Some(1),false,None),List(),Savepoints(300,/app/savepoints),Set(),Validation(true,60000,1000,100,0.001,0))
24/04/29 10:52:27 WARN ApacheUtils: NoSuchMethodException was thrown when disabling normalizeUri. This indicates you are using an old version (< 4.5.8) of Apache http client. It is recommended to use http client version >= 4.5.9 to avoid the breaking change introduced in apache client 4.5.7 and the latency in exception handling. See https://github.com/aws/aws-sdk-java/issues/1919 for more information
Exception in thread "main" com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to dynamodb.us-east-1.amazonaws.com:8000 [dynamodb.us-east-1.amazonaws.com/52.119.234.84] failed: Connection refused (Connection refused)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1201)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1147)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:796)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:764)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:738)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:698)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:680)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:544)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:524)
        at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:5110)
        at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:5077)
        at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeDescribeTable(AmazonDynamoDBClient.java:1981)
        at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.describeTable(AmazonDynamoDBClient.java:1947)
        at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.describeTable(AmazonDynamoDBClient.java:1993)
        at com.scylladb.migrator.readers.DynamoDB$.readRDD(DynamoDB.scala:52)
        at com.scylladb.migrator.readers.DynamoDB$.readRDD(DynamoDB.scala:19)
        at com.scylladb.migrator.alternator.AlternatorMigrator$.migrate(AlternatorMigrator.scala:20)
        at com.scylladb.migrator.Migrator$.main(Migrator.scala:43)
        at com.scylladb.migrator.Migrator.main(Migrator.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.http.conn.HttpHostConnectException: Connect to dynamodb.us-east-1.amazonaws.com:8000 [dynamodb.us-east-1.amazonaws.com/52.119.234.84] failed: Connection refused (Connection refused)
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:159)
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
        at com.amazonaws.http.conn.$Proxy13.connect(Unknown Source)
        at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:394)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
        at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1323)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1139)
        ... 29 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:607)
        at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:339)
        at com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.connectSocket(SdkTLSSocketFactory.java:142)
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
        ... 45 more
julienrf commented 2 months ago

Hey @anand-chandrashekar, could you please try replacing the endpoint with just the region in the config, as follows?

 source:
   type: dynamodb
   table: anand-mig-1
-  endpoint:
-    host: dynamodb.us-east-1.amazonaws.com
-    port: 8000
+  region: us-east-1
   credentials:
     …
anand-chandrashekar commented 2 months ago

Hi @julienrf I got a different error. cc: @gcarmin

Exception in thread "main" com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: The security token included in the request is invalid. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: UnrecognizedClientException; Request ID: HQS4CFKMN28L2FSUKQQCVUQG4VVV4KQNSO5AEMVJF66Q9ASUAAJG)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1799)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1383)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1359)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1139)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:796)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:764)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:738)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:698)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:680)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:544)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:524)
        at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:5110)
        at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:5077)
        at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeDescribeTable(AmazonDynamoDBClient.java:1981)
        at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.describeTable(AmazonDynamoDBClient.java:1947)
        at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.describeTable(AmazonDynamoDBClient.java:1993)
        at com.scylladb.migrator.readers.DynamoDB$.readRDD(DynamoDB.scala:52)
        at com.scylladb.migrator.readers.DynamoDB$.readRDD(DynamoDB.scala:19)
        at com.scylladb.migrator.alternator.AlternatorMigrator$.migrate(AlternatorMigrator.scala:20)
        at com.scylladb.migrator.Migrator$.main(Migrator.scala:43)
        at com.scylladb.migrator.Migrator.main(Migrator.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
julienrf commented 2 months ago

Could please confirm that your credentials work for the region us-east-1? Do they work in the AWS console?

anand-chandrashekar commented 2 months ago

image Yes, my credentials work.

julienrf commented 2 months ago

I still suspect there is something wrong with the credentials (see this discussion). Could you please double-check the permissions associated with the actual credentials (see the documentation)?

anand-chandrashekar commented 2 months ago

I've setup aws properly and I can list-tables.

Exception in thread "main" com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: The security token included in the request is invalid. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ...
ubuntu@ip-10-0-0-129:~/install/spark/spark-2.4.4-bin-hadoop2.7/bin$ aws dynamodb list-tables
{
    "TableNames": [
        "anand-mig-1",
    ]
}
anand-chandrashekar commented 2 months ago

Is it possible for migrator to connect via https (port 443). The aws dynamodb list-tables goes via that port.

anand-chandrashekar commented 2 months ago

It was indeed a permission issue. I was able to get past it and get the tool worling. Closing this. Thank you @julienrf