Open maxpoulain opened 3 years ago
That looks pretty weird. So instead of the link pointing to an IP address or host name, it literally points to a block of HTML?
Can you share your Flintrock config? Do you see the same behavior if you launch a cluster on a public VPC?
Yes its pretty weird and yes it seems to point to a block of HTML instead of IP address or host name..
Here is our Flintrock config:
services:
spark:
version: 2.2.0
download-source: s3://our-bucket/spark-related-packages/
hdfs:
version: 2.7.3
download-source: s3://our-bucket/spark-related-packages/
provider: ec2
providers:
ec2:
key-name: key
identity-file: key.pem
instance-type: m5.2xlarge
region: eu-west-1
availability-zone: eu-west-1c
ami: our-custom-ami # Based on Amazon Linux 2 AMI
user: ec2-user
spot-price: 0.4
vpc-id: our-vpc-id
subnet-id: our-subnet-id
instance-profile-name: our-role
tags:
- TEAM,DATA
min-root-ebs-size-gb: 120
tenancy: default
ebs-optimized: no
instance-initiated-shutdown-behavior: terminate
authorize-access-from:
- X.X.X.X/8
- Y.Y.Y.Y/8
launch:
num-slaves: 3
install-hdfs: True
install-spark: True
java-version: 8
I just tried to launch a cluster on a public VPC and it's working well without any error ! So it seems to be related to the private VPC..
Is it just the UI that's broken? I would expect something to be wrong with the cluster too.
Can you post the full contents of the files under spark/conf
on the cluster master (in the case where the UI is broken)?
I think I just found the problem inside spark/conf/spark-env.sh
!
There is a curl
to have SPARK_PUBLIC_DNS
but it returns :
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>404 - Not Found</title>
</head>
<body>
<h1>404 - Not Found</h1>
</body>
</html>
Here is spark/conf/spark-env.sh
file:
#!/usr/bin/env bash
export SPARK_LOCAL_DIRS="/media/root/spark"
# Standalone cluster options
export SPARK_EXECUTOR_INSTANCES="1"
export SPARK_EXECUTOR_CORES="$(($(nproc) / 1))"
export SPARK_WORKER_CORES="$(nproc)"
export SPARK_MASTER_HOST="<masked_master_hostname>"
# TODO: Make this dependent on HDFS install.
export HADOOP_CONF_DIR="$HOME/hadoop/conf"
# TODO: Make this non-EC2-specific.
# Bind Spark's web UIs to this machine's public EC2 hostname
export SPARK_PUBLIC_DNS="$(curl --silent http://169.254.169.254/latest/meta-data/public-hostname)"
# TODO: Set a high ulimit for large shuffles
# Need to find a way to do this, since "sudo ulimit..." doesn't fly.
# Probably need to edit some Linux config file.
# ulimit -n 1000000
# Should this be made part of a Python service somehow?
export PYSPARK_PYTHON="python3"
It seems that http://169.254.169.254/latest/meta-data/public-hostname
is not working no ? Because when I do curl http://169.254.169.254/latest/meta-data/
I have :
ami-id
ami-launch-index
ami-manifest-path
block-device-mapping/
events/
hostname
iam/
identity-credentials/
instance-action
instance-id
instance-life-cycle
instance-type
local-hostname
local-ipv4
mac
metrics/
network/
placement/
profile
public-keys/
reservation-id
security-groups
There is no public-hostname
!
OK, it sounds like we need to understand how to set SPARK_PUBLIC_DNS
when launching into a private VPC. Do things work if it's just left unset?
I just tried to launch a new cluster into private VPC by commenting out the line of SPARK_PUBLIC_DNS like that:
# export SPARK_PUBLIC_DNS="$(curl --silent http://169.254.169.254/latest/meta-data/public-hostname)"
And it seems to work perfectly ! There is no error and we no longer have the previous error !
OK great. Maybe we don't need this config at all anymore, or maybe we only need it when launching into a public VPC.
Hi,
We are having issues in the Spark UI notably when using Flintrock. To have more context on our use of Flintrock:
authorize-access-from
in the config file.When we go to the Spark UI on port 8080, the page is displayed correctly but the links to the other pages are broken. Here you can find an extract from the HTML code of the page for a link to a worker.
I have a similar error message when I launch
spark-shell
for example:I have the impression that the error comes from a problem finding the ip address or something related.. maybe we have done a mistake with our configuration or maybe it's not related to Flintrock..
So, if you have any clues or elements to solve this problem, this would be a great help for us.
Thank you in advance for your help,
Maxime