Open grassiale opened 2 years ago
Hello, do you have any updates on this issue?
We are currently working around this by manually substituting the mappings on nodes.
What essentially happens is that the nodes of the cassandra cluster (all except one, actually) will have a mapping in /var/lib/cassandra/.restore_mappings
that contains ips instead of pod names:
{
"host_map":
{
"10.71.16.133":
{
"seed": false,
"source":
[
"k8ssandra-rack-a-sts-0"
]
},
"10.71.86.153":
{
"seed": false,
"source":
[
"10.71.47.236"
]
},
"localhost":
{
"seed": false,
"source":
[
"10.71.47.233"
]
}
},
"in_place": false
}
What we do is, scale the operator to 0 pods, scale the statefulsets to 0, substitute the ip in the localhost key in the json with the respective pod name, on every volume of the cluster and restart the operator. This makes restores work. But is obviously not ideal.
If you think I should open an issue to https://github.com/thelastpickle / cassandra-medusa because it is not inherent to k8ssandra, please let me know.
Hi @grassiale,
if the source in the mapping looks like what you're seeing (one with the pod name, the others with IP addresses), then it means the pods aren't managing to resolve other pods IPs to their hostname.
You can verify this by checking the topology*.json
files in the backup metadata, in the S3 bucket, which should show the same pattern (one pod name, two ip addresses).
Could you check which version of cass-operator is running? We made the necessary changes to fix pod ip address resolving in v1.11.0, and that's what k8ssandra-operator should install when running v1.1.0 and v1.1.1.
We're going to need the output of kubectl get cassdc/dc-backup-test -o yaml
and the same for the stateful set kubectl get statefulset/k8ssandra-backup-test-dc-backup-test-rack-a-sts -o yaml
.
I'd like to check what's the serviceName
value there.
Hi @grassiale, any update on this issue?
Sorry, I was away for my PTO and it slipped my mind. Will have a look at it tomorrow.
OK, sorry for the huge delay. I did a fresh test with fresh resources.
I'm using k8ssandra-operator:v1.2.0 and the image for cass-operator is k8ssandra/cass-operator:v1.12.0
the output of getting cass data center is:
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
annotations:
k8ssandra.io/resource-hash: 6ZKMF06HT8XN93IFEk45NnP4BTzRUOMB8jYeiuVa9JU=
creationTimestamp: "2022-10-07T10:16:45Z"
finalizers:
- finalizer.cassandra.datastax.com
generation: 4
labels:
app.kubernetes.io/component: cassandra
app.kubernetes.io/created-by: k8ssandracluster-controller
app.kubernetes.io/name: k8ssandra-operator
app.kubernetes.io/part-of: k8ssandra
k8ssandra.io/cluster-name: dc-backup-test
k8ssandra.io/cluster-namespace: k8ssandra
name: dc-backup-test
namespace: k8ssandra
resourceVersion: "48656953"
uid: b550c8e1-1649-4def-9f3b-61ecba8d0add
spec:
additionalServiceConfig:
additionalSeedService: {}
allpodsService: {}
dcService: {}
nodePortService: {}
seedService: {}
clusterName: dc-backup-test
config:
cassandra-env-sh:
additional-jvm-opts:
- -Dcassandra.system_distributed_replication=dc-backup-test:3
- -Dcom.sun.management.jmxremote.authenticate=false
- -Dcassandra.system_distributed_replication_dc_names=dc-backup-test
- -Dcassandra.system_distributed_replication_per_dc=3
cassandra-yaml:
allocate_tokens_for_local_replication_factor: 3
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
compaction_throughput_mb_per_sec: 200
concurrent_compactors: 32
num_tokens: 16
read_request_timeout_in_ms: 8000
request_timeout_in_ms: 12000
role_manager: CassandraRoleManager
stream_entire_sstables: true
stream_throughput_outbound_megabits_per_sec: 61440
streaming_connections_per_host: 6
write_request_timeout_in_ms: 6000
jvm-server-options:
initial_heap_size: 1000000000
max_heap_size: 1000000000
configBuilderResources: {}
managementApiAuth: {}
networking:
nodePort:
internode: 30007
native: 30006
podTemplateSpec:
metadata: {}
spec:
containers:
- env:
- name: MEDUSA_MODE
value: GRPC
- name: MEDUSA_TMP_DIR
value: /var/lib/cassandra
- name: CQL_USERNAME
valueFrom:
secretKeyRef:
key: username
name: dc-backup-test-medusa
- name: CQL_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: dc-backup-test-medusa
image: docker.io/k8ssandra/medusa:0.13.4
imagePullPolicy: IfNotPresent
name: medusa
ports:
- containerPort: 50051
name: grpc
protocol: TCP
resources:
limits:
memory: 8Gi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- mountPath: /etc/cassandra
name: server-config
- mountPath: /var/lib/cassandra
name: server-data
- mountPath: /etc/medusa
name: dc-backup-test-medusa
- mountPath: /etc/podinfo
name: podinfo
- mountPath: /etc/medusa-secrets
name: medusa-bucket-key
- env:
- name: METRIC_FILTERS
value: deny:org.apache.cassandra.metrics.Table deny:org.apache.cassandra.metrics.table
allow:org.apache.cassandra.metrics.table.live_ss_table_count allow:org.apache.cassandra.metrics.Table.LiveSSTableCount
allow:org.apache.cassandra.metrics.table.live_disk_space_used allow:org.apache.cassandra.metrics.table.LiveDiskSpaceUsed
allow:org.apache.cassandra.metrics.Table.Pending allow:org.apache.cassandra.metrics.Table.Memtable
allow:org.apache.cassandra.metrics.Table.Compaction allow:org.apache.cassandra.metrics.table.read
allow:org.apache.cassandra.metrics.table.write allow:org.apache.cassandra.metrics.table.range
allow:org.apache.cassandra.metrics.table.coordinator allow:org.apache.cassandra.metrics.table.dropped_mutations
name: cassandra
resources: {}
initContainers:
- name: server-config-init
resources: {}
- env:
- name: MEDUSA_MODE
value: RESTORE
- name: MEDUSA_TMP_DIR
value: /var/lib/cassandra
- name: CQL_USERNAME
valueFrom:
secretKeyRef:
key: username
name: dc-backup-test-medusa
- name: CQL_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: dc-backup-test-medusa
- name: BACKUP_NAME
value: dc-eu-west-1-07102022
- name: RESTORE_KEY
value: fd84d0e4-6ca1-49ec-8fe2-0616ffa07e46
image: docker.io/k8ssandra/medusa:0.13.4
imagePullPolicy: IfNotPresent
name: medusa-restore
resources:
limits:
memory: 8Gi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- mountPath: /etc/cassandra
name: server-config
- mountPath: /var/lib/cassandra
name: server-data
- mountPath: /etc/medusa
name: dc-backup-test-medusa
- mountPath: /etc/podinfo
name: podinfo
- mountPath: /etc/medusa-secrets
name: medusa-bucket-key
volumes:
- configMap:
name: dc-backup-test-medusa
name: dc-backup-test-medusa
- name: medusa-bucket-key
secret:
secretName: medusa-bucket-key
- downwardAPI:
items:
- fieldRef:
fieldPath: metadata.labels
path: labels
name: podinfo
racks:
- name: rack-a
nodeAffinityLabels:
topology.kubernetes.io/zone: eu-west-1a
- name: rack-b
nodeAffinityLabels:
topology.kubernetes.io/zone: eu-west-1b
- name: rack-c
nodeAffinityLabels:
topology.kubernetes.io/zone: eu-west-1c
resources:
requests:
cpu: "1"
memory: 2Gi
serverImage: k8ssandra/cass-management-api:4.0.4-v0.1.40
serverType: cassandra
serverVersion: 4.0.4
size: 3
storageConfig:
cassandraDataVolumeClaimSpec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: gp2
superuserSecretName: dc-backup-test-superuser
systemLoggerResources: {}
tolerations:
- effect: NoSchedule
key: datanode
operator: Equal
value: "true"
users:
- secretName: dc-backup-test-medusa
superuser: true
status:
cassandraOperatorProgress: Updating
conditions:
- lastTransitionTime: "2022-10-07T10:21:53Z"
message: ""
reason: ""
status: "True"
type: Healthy
- lastTransitionTime: "2022-10-07T10:30:57Z"
message: ""
reason: ""
status: "False"
type: Stopped
- lastTransitionTime: "2022-10-07T10:21:55Z"
message: ""
reason: ""
status: "False"
type: ReplacingNodes
- lastTransitionTime: "2022-10-07T10:30:42Z"
message: ""
reason: ""
status: "False"
type: Updating
- lastTransitionTime: "2022-10-07T10:21:55Z"
message: ""
reason: ""
status: "False"
type: RollingRestart
- lastTransitionTime: "2022-10-07T10:30:57Z"
message: ""
reason: ""
status: "True"
type: Resuming
- lastTransitionTime: "2022-10-07T10:21:55Z"
message: ""
reason: ""
status: "False"
type: ScalingDown
- lastTransitionTime: "2022-10-07T10:21:55Z"
message: ""
reason: ""
status: "True"
type: Valid
- lastTransitionTime: "2022-10-07T10:21:55Z"
message: ""
reason: ""
status: "True"
type: Initialized
- lastTransitionTime: "2022-10-07T10:28:27Z"
message: ""
reason: ""
status: "False"
type: Ready
lastServerNodeStarted: "2022-10-07T10:31:47Z"
nodeStatuses:
dc-backup-test-dc-backup-test-rack-a-sts-0:
hostID: a483d9d7-a262-4f34-949a-e406973cbade
dc-backup-test-dc-backup-test-rack-b-sts-0:
hostID: e271f210-c3ff-425d-9730-f7a8cc2d398c
dc-backup-test-dc-backup-test-rack-c-sts-0:
hostID: 38e165a8-c858-490a-8fb2-a393514803e8
observedGeneration: 3
quietPeriod: "2022-10-07T10:30:48Z"
superUserUpserted: "2022-10-07T10:21:56Z"
usersUpserted: "2022-10-07T10:21:56Z"
and statefulsets are attached: statefulsets.yaml.zip
I can't find the topology file on s3. The list of file is attached: filelist.txt
FYI the test on fresh resources gave the results I expected: created k8ssandra cluster, performed backup, succeded, tried restore, restore procedure started, only one pod managed to restore succesfully, the other pods are crashlooping in container medusa restore:
dc-backup-test-dc-backup-test-rack-a-sts-0 3/3 Running 0 13m
dc-backup-test-dc-backup-test-rack-b-sts-0 0/3 Init:CrashLoopBackOff 7 13m
dc-backup-test-dc-backup-test-rack-c-sts-0 0/3 Init:CrashLoopBackOff 7 13m
the log is (from rack c bucket) :
ERROR:root:No such backup
[2022-10-07 10:42:32,443] ERROR: No such backup
Mapping: {'in_place': True, 'host_map': {'10.71.28.95': {'source': ['dc-backup-test-dc-backup-test-rack-a-sts-0'], 'seed': False}, 'localhost': {'source': ['10.71.57.183'], 'seed': False}, '10.71.71.44': {'source': ['10.71.71.44'], 'seed': False}}}´´´
My bad, the files we're interested in are tokenmap*.json
, not topology*.json
.
they appear here in your file list:
2022-10-07 12:23:33 1316 cassandra-backup-test/index/backup_index/dc-eu-west-1-07102022/tokenmap_dc-backup-test-dc-backup-test-rack-a-sts-0.json
2022-10-07 12:23:33 1315 cassandra-backup-test/index/backup_index/dc-eu-west-1-07102022/tokenmap_dc-backup-test-dc-backup-test-rack-b-sts-0.json
2022-10-07 12:23:33 1316 cassandra-backup-test/index/backup_index/dc-eu-west-1-07102022/tokenmap_dc-backup-test-dc-backup-test-rack-c-sts-0.json
tokenmap_dc-backup-test-dc-backup-test-rack-a-sts-0.json:
{"dc-backup-test-dc-backup-test-rack-a-sts-0": {"tokens": [-2120304533898230137, -3997368863411216165, -5817373164391162564, -6881714154164225960, -8140134745307878994, -951349610719215758, 1033687362401490749, 1449325021973130629, 2661005368645111774, 3058328220993824565, 4316748936939267653, 5351049583614672020, 5777336084895801410, 7127537911470479293, 8184914428399437510, 97774189609975329], "is_up": true, "rack": "rack-a", "dc": "dc-backup-test"}, "10.71.57.183": {"tokens": [-1701626850987540579, -2780999658435142236, -3516915351146035876, -458106845306948606, -4592671232371615416, -5410343164983966210, -6428194569726393226, -7499695186066426038, -8939963415969009278, 2099261950285485530, 3718285245982528827, 4696430698498642903, 509257397771586892, 6475064999322230031, 7571106899557168320, 8662404010249380579], "is_up": true, "rack": "rack-b", "dc": "dc-backup-test"}, "10.71.71.44": {"tokens": [-1254055511089660192, -2371293961961357785, -3085589426359018778, -4917505155314101461, -6055631694703996453, -7117984094374347992, -7780115600676589228, -8444624302853847303, 1817668213599905947, 3450149177604169998, 4083133001077304121, 5061023857470792785, 6208347565826469493, 6875278314747410329, 7932834999437458887, 9175273314117591040], "is_up": true, "rack": "rack-c", "dc": "dc-backup-test"}}
tokenmap_dc-backup-test-dc-backup-test-rack-b-sts-0.json:
{"10.71.28.95": {"tokens": [-2120304533898230137, -3997368863411216165, -5817373164391162564, -6881714154164225960, -8140134745307878994, -951349610719215758, 1033687362401490749, 1449325021973130629, 2661005368645111774, 3058328220993824565, 4316748936939267653, 5351049583614672020, 5777336084895801410, 7127537911470479293, 8184914428399437510, 97774189609975329], "is_up": true, "rack": "rack-a", "dc": "dc-backup-test"}, "dc-backup-test-dc-backup-test-rack-b-sts-0": {"tokens": [-1701626850987540579, -2780999658435142236, -3516915351146035876, -458106845306948606, -4592671232371615416, -5410343164983966210, -6428194569726393226, -7499695186066426038, -8939963415969009278, 2099261950285485530, 3718285245982528827, 4696430698498642903, 509257397771586892, 6475064999322230031, 7571106899557168320, 8662404010249380579], "is_up": true, "rack": "rack-b", "dc": "dc-backup-test"}, "10.71.71.44": {"tokens": [-1254055511089660192, -2371293961961357785, -3085589426359018778, -4917505155314101461, -6055631694703996453, -7117984094374347992, -7780115600676589228, -8444624302853847303, 1817668213599905947, 3450149177604169998, 4083133001077304121, 5061023857470792785, 6208347565826469493, 6875278314747410329, 7932834999437458887, 9175273314117591040], "is_up": true, "rack": "rack-c", "dc": "dc-backup-test"}}
tokenmap_dc-backup-test-dc-backup-test-rack-c-sts-0.json:
{"10.71.28.95": {"tokens": [-2120304533898230137, -3997368863411216165, -5817373164391162564, -6881714154164225960, -8140134745307878994, -951349610719215758, 1033687362401490749, 1449325021973130629, 2661005368645111774, 3058328220993824565, 4316748936939267653, 5351049583614672020, 5777336084895801410, 7127537911470479293, 8184914428399437510, 97774189609975329], "is_up": true, "rack": "rack-a", "dc": "dc-backup-test"}, "10.71.57.183": {"tokens": [-1701626850987540579, -2780999658435142236, -3516915351146035876, -458106845306948606, -4592671232371615416, -5410343164983966210, -6428194569726393226, -7499695186066426038, -8939963415969009278, 2099261950285485530, 3718285245982528827, 4696430698498642903, 509257397771586892, 6475064999322230031, 7571106899557168320, 8662404010249380579], "is_up": true, "rack": "rack-b", "dc": "dc-backup-test"}, "dc-backup-test-dc-backup-test-rack-c-sts-0": {"tokens": [-1254055511089660192, -2371293961961357785, -3085589426359018778, -4917505155314101461, -6055631694703996453, -7117984094374347992, -7780115600676589228, -8444624302853847303, 1817668213599905947, 3450149177604169998, 4083133001077304121, 5061023857470792785, 6208347565826469493, 6875278314747410329, 7932834999437458887, 9175273314117591040], "is_up": true, "rack": "rack-c", "dc": "dc-backup-test"}}%
yep, that's what I thought. In each of them, you have one hostname and two IP addresses. This means the pods manage to resolve their own IP to a hostname but they can't resolve the other pods IPs to their respective hostnames 🤔
I've checked the statefulsets definition and they have the expected serviceName, so I'm a little puzzled...
DNS issues like this aren't easy to debug.
Could you check one of the medusa container logs for lines like this? [2022-10-07 01:30:15,938] DEBUG: Resolved 10.64.1.2 to dogfood-dc2-default-sts-2
To do some debugging, you'd need to ssh into one of the medusa containers, run python3
and then try to resolve the IP of another Cassandra pod:
import dns.resolver
import dns.reversename
reverse_name = dns.reversename.from_address(ip_address).to_text()
fqdns = dns.resolver.resolve(reverse_name, 'PTR')
for fqdn in fqdns:
print(fqdn.to_text())
That should show us which hostnames correspond to the pods IP addresses.
FTR, the prepare restore error in the logs is a red herring, don't worry about it.
Here are the lines when starting a backup from pod in rack a :
[2022-10-07 13:55:31,152] DEBUG: Checking placement using dc and rack...
INFO:root:Resolving ip address 10.71.12.222
[2022-10-07 13:55:31,152] INFO: Resolving ip address 10.71.12.222
INFO:root:ip address to resolve 10.71.12.222
[2022-10-07 13:55:31,152] INFO: ip address to resolve 10.71.12.222
DEBUG:cassandra.connection:Sending initial options message for new connection (140436372562552) to 10.71.57.183:30006
[2022-10-07 13:55:31,154] DEBUG: Sending initial options message for new connection (140436372562552) to 10.71.57.183:30006
DEBUG:cassandra.connection:Received options response on new connection (140436372562552) from 10.71.57.183:30006
[2022-10-07 13:55:31,156] DEBUG: Received options response on new connection (140436372562552) from 10.71.57.183:30006
DEBUG:cassandra.connection:No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['snappy', 'lz4']
[2022-10-07 13:55:31,157] DEBUG: No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['snappy', 'lz4']
DEBUG:cassandra.connection:Sending StartupMessage on <LibevConnection(140436372562552) 10.71.57.183:30006>
[2022-10-07 13:55:31,157] DEBUG: Sending StartupMessage on <LibevConnection(140436372562552) 10.71.57.183:30006>
DEBUG:cassandra.connection:Sent StartupMessage on <LibevConnection(140436372562552) 10.71.57.183:30006>
[2022-10-07 13:55:31,157] DEBUG: Sent StartupMessage on <LibevConnection(140436372562552) 10.71.57.183:30006>
DEBUG:root:Resolved 10.71.12.222 to dc-backup-test-dc-backup-test-rack-a-sts-0
[2022-10-07 13:55:31,158] DEBUG: Resolved 10.71.12.222 to dc-backup-test-dc-backup-test-rack-a-sts-0
WARNING:cassandra.connection:An authentication challenge was not sent, this is suspicious because the driver expects authentication (configured authenticator = PlainTextAuthenticator)
[2022-10-07 13:55:31,159] WARNING: An authentication challenge was not sent, this is suspicious because the driver expects authentication (configured authenticator = PlainTextAuthenticator)
DEBUG:cassandra.connection:Got ReadyMessage on new connection (140436372562552) from 10.71.57.183:30006
[2022-10-07 13:55:31,159] DEBUG: Got ReadyMessage on new connection (140436372562552) from 10.71.57.183:30006
DEBUG:cassandra.connection:Enabling protocol checksumming on connection (140436372562552).
[2022-10-07 13:55:31,159] DEBUG: Enabling protocol checksumming on connection (140436372562552).
DEBUG:cassandra.pool:Finished initializing connection for host 10.71.57.183:30006
[2022-10-07 13:55:31,159] DEBUG: Finished initializing connection for host 10.71.57.183:30006
DEBUG:cassandra.cluster:Added pool for host 10.71.57.183:30006 to session
[2022-10-07 13:55:31,160] DEBUG: Added pool for host 10.71.57.183:30006 to session
DEBUG:root:Checking host 10.71.12.222 against 10.71.12.222/dc-backup-test-dc-backup-test-rack-a-sts-0
[2022-10-07 13:55:31,159] DEBUG: Checking host 10.71.12.222 against 10.71.12.222/dc-backup-test-dc-backup-test-rack-a-sts-0
INFO:root:Resolving ip address 10.71.12.222
[2022-10-07 13:55:31,160] INFO: Resolving ip address 10.71.12.222
INFO:root:ip address to resolve 10.71.12.222
[2022-10-07 13:55:31,160] INFO: ip address to resolve 10.71.12.222
DEBUG:root:Resolved 10.71.12.222 to dc-backup-test-dc-backup-test-rack-a-sts-0
[2022-10-07 13:55:31,165] DEBUG: Resolved 10.71.12.222 to dc-backup-test-dc-backup-test-rack-a-sts-0
INFO:root:Resolving ip address 10.71.57.183
[2022-10-07 13:55:31,165] INFO: Resolving ip address 10.71.57.183
INFO:root:ip address to resolve 10.71.57.183
[2022-10-07 13:55:31,166] INFO: ip address to resolve 10.71.57.183
DEBUG:root:Resolved 10.71.57.183 to 10.71.57.183
[2022-10-07 13:55:31,169] DEBUG: Resolved 10.71.57.183 to 10.71.57.183
INFO:root:Resolving ip address 10.71.71.44
[2022-10-07 13:55:31,169] INFO: Resolving ip address 10.71.71.44
INFO:root:ip address to resolve 10.71.71.44
[2022-10-07 13:55:31,169] INFO: ip address to resolve 10.71.71.44
DEBUG:root:Resolved 10.71.71.44 to 10.71.71.44
[2022-10-07 13:55:31,172] DEBUG: Resolved 10.71.71.44 to 10.71.71.44
and from pod in rack c:
INFO:root:Resolving ip address 10.71.93.1
[2022-10-07 13:55:31,052] INFO: Resolving ip address 10.71.93.1
DEBUG:cassandra.connection:No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['snappy', 'lz4']
[2022-10-07 13:55:31,052] DEBUG: No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['snappy', 'lz4']
INFO:root:ip address to resolve 10.71.93.1
[2022-10-07 13:55:31,052] INFO: ip address to resolve 10.71.93.1
DEBUG:cassandra.connection:Sending StartupMessage on <LibevConnection(139904893611144) 10.71.57.183:30006>
[2022-10-07 13:55:31,053] DEBUG: Sending StartupMessage on <LibevConnection(139904893611144) 10.71.57.183:30006>
DEBUG:cassandra.connection:Sent StartupMessage on <LibevConnection(139904893611144) 10.71.57.183:30006>
[2022-10-07 13:55:31,053] DEBUG: Sent StartupMessage on <LibevConnection(139904893611144) 10.71.57.183:30006>
DEBUG:root:Resolved 10.71.93.1 to dc-backup-test-dc-backup-test-rack-c-sts-0
[2022-10-07 13:55:31,057] DEBUG: Resolved 10.71.93.1 to dc-backup-test-dc-backup-test-rack-c-sts-0
DEBUG:root:Checking host 10.71.93.1 against 10.71.93.1/dc-backup-test-dc-backup-test-rack-c-sts-0
[2022-10-07 13:55:31,057] DEBUG: Checking host 10.71.93.1 against 10.71.93.1/dc-backup-test-dc-backup-test-rack-c-sts-0
INFO:root:Resolving ip address 10.71.28.95
[2022-10-07 13:55:31,057] INFO: Resolving ip address 10.71.28.95
INFO:root:ip address to resolve 10.71.28.95
[2022-10-07 13:55:31,057] INFO: ip address to resolve 10.71.28.95
DEBUG:root:Resolved 10.71.28.95 to 10.71.28.95
[2022-10-07 13:55:31,059] DEBUG: Resolved 10.71.28.95 to 10.71.28.95
INFO:root:Resolving ip address 10.71.57.183
[2022-10-07 13:55:31,060] INFO: Resolving ip address 10.71.57.183
INFO:root:ip address to resolve 10.71.57.183
[2022-10-07 13:55:31,060] INFO: ip address to resolve 10.71.57.183
WARNING:cassandra.connection:An authentication challenge was not sent, this is suspicious because the driver expects authentication (configured authenticator = PlainTextAuthenticator)
[2022-10-07 13:55:31,062] WARNING: An authentication challenge was not sent, this is suspicious because the driver expects authentication (configured authenticator = PlainTextAuthenticator)
DEBUG:cassandra.connection:Got ReadyMessage on new connection (139904893611144) from 10.71.57.183:30006
[2022-10-07 13:55:31,062] DEBUG: Got ReadyMessage on new connection (139904893611144) from 10.71.57.183:30006
DEBUG:cassandra.connection:Enabling protocol checksumming on connection (139904893611144).
[2022-10-07 13:55:31,062] DEBUG: Enabling protocol checksumming on connection (139904893611144).
DEBUG:cassandra.pool:Finished initializing connection for host 10.71.57.183:30006
[2022-10-07 13:55:31,062] DEBUG: Finished initializing connection for host 10.71.57.183:30006
DEBUG:cassandra.cluster:Added pool for host 10.71.57.183:30006 to session
[2022-10-07 13:55:31,062] DEBUG: Added pool for host 10.71.57.183:30006 to session
DEBUG:root:Resolved 10.71.57.183 to 10.71.57.183
[2022-10-07 13:55:31,066] DEBUG: Resolved 10.71.57.183 to 10.71.57.183
INFO:root:Resolving ip address 10.71.93.1
[2022-10-07 13:55:31,066] INFO: Resolving ip address 10.71.93.1
INFO:root:ip address to resolve 10.71.93.1
[2022-10-07 13:55:31,066] INFO: ip address to resolve 10.71.93.1
DEBUG:root:Resolved 10.71.93.1 to dc-backup-test-dc-backup-test-rack-c-sts-0
[2022-10-07 13:55:31,072] DEBUG: Resolved 10.71.93.1 to dc-backup-test-dc-backup-test-rack-c-sts-0
And the resolution test from rack c medusa container of pod in rack a:
Type "help", "copyright", "credits" or "license" for more information.
>>> import dns.resolver
>>> import dns.reversename
>>>
>>> reverse_name = dns.reversename.from_address("10.71.12.222").to_text()
>>> fqdns = dns.resolver.resolve(reverse_name, 'PTR')
>>> for fqdn in fqdns:
... print(fqdn.to_text())
...
10-71-12-222.dc-backup-test-dc-backup-test-service.k8ssandra.svc.cluster.local.
10-71-12-222.dc-backup-test-seed-service.k8ssandra.svc.cluster.local.
10-71-12-222.dc-backup-test-dc-backup-test-additional-seed-service.k8ssandra.svc.cluster.local.
dc-backup-test-dc-backup-test-rack-a-sts-0.dc-backup-test-dc-backup-test-all-pods-service.k8ssandra.svc.cluster.local.
10-71-12-222.dc-backup-test-dc-backup-test-node-port-service.k8ssandra.svc.cluster.local.
In case you need it, also a test from node in rack a resolving node in rack c:
10-71-93-1.dc-backup-test-dc-backup-test-service.k8ssandra.svc.cluster.local.
10-71-93-1.dc-backup-test-dc-backup-test-node-port-service.k8ssandra.svc.cluster.local.
dc-backup-test-dc-backup-test-rack-c-sts-0.dc-backup-test-dc-backup-test-all-pods-service.k8ssandra.svc.cluster.local.
10-71-93-1.dc-backup-test-seed-service.k8ssandra.svc.cluster.local.
10-71-93-1.dc-backup-test-dc-backup-test-additional-seed-service.k8ssandra.svc.cluster.local.
I'm still available for further tests if needed ;)
@grassiale hi, which version of coredns is running?
602401143452.dkr.ecr.eu-west-1.amazonaws.com/eks/coredns:v1.8.4-eksbuild.1
Hello again k8ssandra community
What did you do? We are performing a medusa backup of a cassandra datacenter with 3 racks and trying to restore the data on the same cluster. We are not achieving that because medusa-restore is apparently trying to download data from folders that refer to ips of cassandra nodes instead of their pod names. Actually only one of the pods is able to restore the data correctly, the other ones are left by the operator in a podInitCrashloopBackoff state. There are some errors presented below in the operator logs, but they don't tell me much. Fun fact, when I tried the same procedure on a 2 racks cluster without customizations on tokens, it worked. But with the same error in logs.
Did you expect to see some different? We were expecting medusa restore to download data from the same paths on s3 they were uploaded to on all nodes
Environment
K8ssandra Operator version:
v1.1.1
Kubernetes version information:Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.12-eks-a64ea69", GitCommit:"d4336843ba36120e9ed1491fddff5f2fec33eb77", GitTreeState:"clean", BuildDate:"2022-05-12T18:29:27Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes cluster kind:``` EKS 1.21Manifests:
Anything else we need to know?: Logs from the container: medusa-restore-logs.log
┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: K8OP-178