nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
636 stars 116 forks source link

flintrock constant failure : Exception: Failed to install Spark #290

Closed shail-burman closed 4 years ago

shail-burman commented 4 years ago

Here is the log file:

ilendra@ubuntu-server1:/usr/local/bin$ flintrock --debug launch
boomi-platform-analysis --ec2-identity-file  ~/Downloads/spark-cluster.pem
--ec2-user ec2-user
2019-07-19 11:28:02,105 - flintrock.ec2       - INFO  - Launching 3
instances...
2019-07-19 11:28:15,239 - flintrock.ec2       - DEBUG - 3 instances not in
state 'running': 'i-0b765794ecb34f060', 'i-0242cd797a72a2966',
'i-09fae853dc92916d3', ...
2019-07-19 11:28:18,766 - flintrock.ec2       - DEBUG - 3 instances not in
state 'running': 'i-0b765794ecb34f060', 'i-09fae853dc92916d3',
'i-0242cd797a72a2966', ...
2019-07-19 11:28:22,067 - flintrock.ec2       - DEBUG - 1 instances not in
state 'running': 'i-0242cd797a72a2966', ...
2019-07-19 11:28:28,371 - flintrock.ssh       - DEBUG - [54.87.10.29] SSH
timeout.
2019-07-19 11:28:28,371 - flintrock.ssh       - DEBUG - [54.174.33.164] SSH
timeout.
2019-07-19 11:28:28,375 - flintrock.ssh       - DEBUG - [3.84.223.206] SSH
timeout.
2019-07-19 11:28:33,458 - flintrock.ssh       - DEBUG - [54.174.33.164] SSH
exception: [Errno None] Unable to connect to port 22 on 54.174.33.164
2019-07-19 11:28:33,463 - flintrock.ssh       - DEBUG - [3.84.223.206] SSH
exception: [Errno None] Unable to connect to port 22 on 3.84.223.206
2019-07-19 11:28:33,465 - flintrock.ssh       - DEBUG - [54.87.10.29] SSH
exception: [Errno None] Unable to connect to port 22 on 54.87.10.29
/home/shailendra/.local/lib/python3.7/site-packages/paramiko/kex_ecdh_nist.py:39:
CryptographyDeprecationWarning: encode_point has been deprecated on
EllipticCurvePublicNumbers and will be removed in a future version. Please use
EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed
point encoding.
  m.add_string(self.Q_C.public_numbers().encode_point())
/home/shailendra/.local/lib/python3.7/site-packages/paramiko/kex_ecdh_nist.py:96:
CryptographyDeprecationWarning: Support for unsafe construction of public
numbers from encoded data will be removed in a future version. Please use
EllipticCurvePublicKey.from_encoded_point
  self.curve, Q_S_bytes
/home/shailendra/.local/lib/python3.7/site-packages/paramiko/kex_ecdh_nist.py:111:
CryptographyDeprecationWarning: encode_point has been deprecated on
EllipticCurvePublicNumbers and will be removed in a future version. Please use
EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed
point encoding.
  hm.add_string(self.Q_C.public_numbers().encode_point())
2019-07-19 11:28:39,295 - flintrock.ssh       - INFO  - [54.87.10.29] SSH
online.
2019-07-19 11:28:39,405 - flintrock.ssh       - INFO  - [3.84.223.206] SSH
online.
2019-07-19 11:28:39,478 - flintrock.ssh       - INFO  - [54.174.33.164] SSH
online.
2019-07-19 11:28:40,440 - flintrock.core      - INFO  - [54.87.10.29]
Configuring ephemeral storage...
2019-07-19 11:28:40,702 - flintrock.core      - INFO  - [3.84.223.206]
Configuring ephemeral storage...
2019-07-19 11:28:40,836 - flintrock.core      - INFO  - [54.174.33.164]
Configuring ephemeral storage...
2019-07-19 11:28:44,294 - flintrock.core      - INFO  - [54.174.33.164]
Installing Java 1.8...
2019-07-19 11:28:44,670 - flintrock.core      - INFO  - [54.87.10.29]
Installing Java 1.8...
2019-07-19 11:28:44,919 - flintrock.core      - INFO  - [3.84.223.206]
Installing Java 1.8...
2019-07-19 11:29:17,524 - flintrock.services  - INFO  - [54.87.10.29]
Installing HDFS...
2019-07-19 11:29:17,783 - flintrock.services  - INFO  - [54.174.33.164]
Installing HDFS...
2019-07-19 11:29:20,342 - flintrock.services  - INFO  - [3.84.223.206]
Installing HDFS...
2019-07-19 11:29:43,652 - flintrock.services  - INFO  - [54.87.10.29]
Installing Spark...
2019-07-19 11:29:44,619 - flintrock.services  - INFO  - [54.174.33.164]
Installing Spark...
2019-07-19 11:29:46,084 - flintrock.services  - INFO  - [3.84.223.206]
Installing Spark...
Do you want to terminate the 3 instances created by this operation? [Y/n]: y
Terminating instances...
Traceback (most recent call last):
  File
"/home/shailendra/.local/lib/python3.7/site-packages/flintrock/core.py", line
599, in setup_node
    cluster=cluster,
  File
"/home/shailendra/.local/lib/python3.7/site-packages/flintrock/services.py",
line 333, in install
    """)
  File "/home/shailendra/.local/lib/python3.7/site-packages/flintrock/ssh.py",
line 145, in ssh_check_output
    message=stdout_output + stderr_output)
flintrock.exceptions.SSHError: [3.84.223.206] ln: failed to create symbolic
link ‘/usr/local/bin/beeline’: File exists

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/flintrock", line 11, in <module>
    sys.exit(main())
  File
"/home/shailendra/.local/lib/python3.7/site-packages/flintrock/flintrock.py",
line 1187, in main
    cli(obj={})
  File "/home/shailendra/.local/lib/python3.7/site-packages/click/core.py",
line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/shailendra/.local/lib/python3.7/site-packages/click/core.py",
line 717, in main
    rv = self.invoke(ctx)
  File "/home/shailendra/.local/lib/python3.7/site-packages/click/core.py",
line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/shailendra/.local/lib/python3.7/site-packages/click/core.py",
line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/shailendra/.local/lib/python3.7/site-packages/click/core.py",
line 555, in invoke
    return callback(*args, **kwargs)
  File
"/home/shailendra/.local/lib/python3.7/site-packages/click/decorators.py",
line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File
"/home/shailendra/.local/lib/python3.7/site-packages/flintrock/flintrock.py",
line 456, in launch
    tags=ec2_tags)
  File "/home/shailendra/.local/lib/python3.7/site-packages/flintrock/ec2.py",
line 53, in wrapper
    res = func(*args, **kwargs)
  File "/home/shailendra/.local/lib/python3.7/site-packages/flintrock/ec2.py",
line 955, in launch
    identity_file=identity_file)
  File
"/home/shailendra/.local/lib/python3.7/site-packages/flintrock/core.py", line
625, in provision_cluster
    run_against_hosts(partial_func=partial_func, hosts=hosts)
  File
"/home/shailendra/.local/lib/python3.7/site-packages/flintrock/core.py", line
492, in run_against_hosts
    future.result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in
__get_result
    raise self._exception
  File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File
"/home/shailendra/.local/lib/python3.7/site-packages/flintrock/core.py", line
681, in provision_node
    cluster=cluster)
  File
"/home/shailendra/.local/lib/python3.7/site-packages/flintrock/core.py", line
605, in setup_node
    ) from e
Exception: Failed to install Spark.
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in
apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in
<module>
    from apport.report import Report
  File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
    import apport.fileutils
  File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in
<module>
    from apport.packaging_impl import impl as packaging
  File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in
<module>
    import apt
  File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
    import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'

Original exception was:
Traceback (most recent call last):
  File
"/home/shailendra/.local/lib/python3.7/site-packages/flintrock/core.py", line
599, in setup_node
    cluster=cluster,
  File
"/home/shailendra/.local/lib/python3.7/site-packages/flintrock/services.py",
line 333, in install
    """)
  File "/home/shailendra/.local/lib/python3.7/site-packages/flintrock/ssh.py",
line 145, in ssh_check_output
    message=stdout_output + stderr_output)
flintrock.exceptions.SSHError: [3.84.223.206] ln: failed to create symbolic
link ‘/usr/local/bin/beeline’: File exists

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/flintrock", line 11, in <module>
    sys.exit(main())
  File
"/home/shailendra/.local/lib/python3.7/site-packages/flintrock/flintrock.py",
line 1187, in main
    cli(obj={})
  File "/home/shailendra/.local/lib/python3.7/site-packages/click/core.py",
line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/shailendra/.local/lib/python3.7/site-packages/click/core.py",
line 717, in main
    rv = self.invoke(ctx)
  File "/home/shailendra/.local/lib/python3.7/site-packages/click/core.py",
line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/shailendra/.local/lib/python3.7/site-packages/click/core.py",
line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/shailendra/.local/lib/python3.7/site-packages/click/core.py",
line 555, in invoke
    return callback(*args, **kwargs)
  File
"/home/shailendra/.local/lib/python3.7/site-packages/click/decorators.py",
line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File
"/home/shailendra/.local/lib/python3.7/site-packages/flintrock/flintrock.py",
line 456, in launch
    tags=ec2_tags)
  File "/home/shailendra/.local/lib/python3.7/site-packages/flintrock/ec2.py",
line 53, in wrapper
    res = func(*args, **kwargs)
  File "/home/shailendra/.local/lib/python3.7/site-packages/flintrock/ec2.py",
line 955, in launch
    identity_file=identity_file)
  File
"/home/shailendra/.local/lib/python3.7/site-packages/flintrock/core.py", line
625, in provision_cluster
    run_against_hosts(partial_func=partial_func, hosts=hosts)
  File
"/home/shailendra/.local/lib/python3.7/site-packages/flintrock/core.py", line
492, in run_against_hosts
    future.result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in
__get_result
    raise self._exception
  File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File
"/home/shailendra/.local/lib/python3.7/site-packages/flintrock/core.py", line
681, in provision_node
    cluster=cluster)
  File
"/home/shailendra/.local/lib/python3.7/site-packages/flintrock/core.py", line
605, in setup_node
    ) from e
Exception: Failed to install Spark.
nchammas commented 4 years ago

ModuleNotFoundError: No module named 'apt_pkg'

What kind of AMI are you using in your cluster config? Flintrock only supports Amazon Linux and similar OSes (like CentOS).

If you want to use an apt-based distribution, for example, I can only provide some general guidance to help you, but you'll have to tweak Flintrock yourself to get it to work.

flintrock.exceptions.SSHError: [3.84.223.206] ln: failed to create symbolic link ‘/usr/local/bin/beeline’: File exists

It also looks like you may be using an AMI that has Spark already installed on it. Is that the case?

shail-burman commented 4 years ago

I am using the default ami that flintlock configure specified. I just changed the bare minimum fields on the yamal file. Namely the spark download link, hdfs link, pem file, ec2 user and number of slaves.

shail-burman commented 4 years ago

Here is the config,yaml for flintrock

services:
  spark:
    version: 2.4.3
    # git-commit: latest  # if not 'latest', provide a full commit SHA; e.g. d6dc12ef0146ae409834c78737c116050961f350
    # git-repository:  # optional; defaults to https://github.com/apache/spark
    # optional; defaults to download from from the official Spark S3 bucket
    #   - must contain a {v} template corresponding to the version
    #   - Spark must be pre-built
    #   - must be a tar.gz file
    #download-source: "https://www.example.com/files/spark/{v}/spark-{v}.tar.gz"
    download-source: "http://mirrors.advancedhosters.com/apache/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz"

    # executor-instances: 1
  hdfs:
    version: 2.7.6
    # optional; defaults to download from a dynamically selected Apache mirror
    #   - must contain a {v} template corresponding to the version
    #   - must be a .tar.gz file
    # download-source: "https://www.example.com/files/hadoop/{v}/hadoop-{v}.tar.gz"
    # download-source: "http://www-us.apache.org/dist/hadoop/common/hadoop-{v}/hadoop-{v}.tar.gz"
    # download-source: "http://www-us.apache.org/dist/hadoop/common/hadoop-{v}/hadoop-{v}.tar.gz"
    download-source: "http://mirrors.advancedhosters.com/apache/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz"

provider: ec2

providers:
  ec2:
    key-name: spark-cluster
    identity-file: /root/anaconda3/spark-cluster.pem
    instance-type: m3.medium
    region: us-east-1
    # availability-zone: <name>
    ami: ami-0b8d0d6ac70e5750c  # Amazon Linux 2, us-east-1
    user: ec2-user
    # ami: ami-61bbf104  # CentOS 7, us-east-1
    # user: centos
    # spot-price: <price>
    # vpc-id: <id>
    # subnet-id: <id>
    # placement-group: <name>
    # security-groups:
    #   - group-name1
    #   - group-name2
    # instance-profile-name:
    # tags:
    #   - key1,value1
    #   - key2, value2  # leading/trailing spaces are trimmed
    #   - key3,  # value will be empty
    # min-root-ebs-size-gb: <size-gb>
    tenancy: default  # default | dedicated
    ebs-optimized: no  # yes | no
    instance-initiated-shutdown-behavior: terminate  # terminate | stop
    # user-data: /path/to/userdata/script

launch:
  num-slaves: 2
  install-hdfs: True
  install-spark: True

debug: false
nchammas commented 4 years ago

How did you install Flintrock? Can you try installing Flintrock using pip in a dedicated virtual environment, and then retrying the cluster launch?

shail-burman commented 4 years ago

I have a Ubuntu virtual machine. I installed flintrock on it using pip3 on this machine.

shailburman commented 4 years ago

Can someone guide me as to what the issue could be? I have followed the instructions exactly but still ut keeps on failing. The machines get created and then all get deleted. I had sent the log files earlier. It seems like Spark is not getting installed properly for some reason (a lnk is trying to get created and it already exists.) Please help me. I am stuck for quite some time now.

nchammas commented 4 years ago

The errors about apt_pkg and beeline that I referred to earlier are strange, and I haven't seen them before. To eliminate the possibility that the errors are somehow caused by how you installed Flintrock, I suggested in my message just above that you try installing Flintrock into a dedicated virtual environment using pip.

Did you try that? Please show the console output from when you installed Flintrock.

shail-burman commented 4 years ago

Nick, I did try from a new virtual environment. I think I may have figured out the issue. The spark tarball does include Hadoop. I think in my script file I was installing both hdfs and spark(which already includes Hadoop. I believe this was causing the issue. I opted to not install hdfs and now the scripts runs fine. I am new to Spark installation so do not know if this will cause issues, but the cluster is running now. Please let me know if I do not install hdfs, will that cause any issues.

nchammas commented 4 years ago

There should be no conflict between Spark and HDFS. Are the tarballs hosted by mirrors.advancedhosters.com (which I see in your config above) the same as the official ones hosted by Apache?

Do you have the same issue if you use the official Apache mirrors instead of the host you currently have configured?

shail-burman commented 4 years ago

I took that mirror from https://www.apache.org/dyn/closer.lua/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz. I did not do a comparison though. I will try and let you know.

nchammas commented 4 years ago

For the record, I just successfully launched a few clusters using Flintrock master, which I think is functionally identical to the 0.11 release.

I just noticed that you are using an older version of HDFS in your config. The configuration template specifies HDFS 2.8.5, and that's what I recommend.

Have you tried that? It might resolve the beeline error, though I don't think it would help with the ModuleNotFoundError you are also seeing.

If updating your configured version of HDFS has the effect I expect, then please reinstall Flintrock into a new virtual environment and show me the entire log of how you did it. Maybe there's a clue in there about the second error.

nchammas commented 4 years ago

Still having the same issue, @shail-burman? If so, please try the things I suggested in my previous message and let me know how they turned out.

shail-burman commented 4 years ago

Sorry Nick for the tardy response, I have been stuck on a problem involving loading millions of small files. Saw yr note on that forum that there is no good way.

Anyways, Thanks so much Hadoop 2.8.5 solved the beeline issue. You may close this issue.

I had a few questions aroiund flint rock.

A) Can we create regular instances instead of spot instances, so we could use them in semi prod environment.

B) Can we use different configurations for driver and workers?\

Thanks, ‘Shail

nchammas commented 4 years ago

Glad to hear the HDFS version fixed your issues.

A) Yes, Flintrock by default creates on-demand instances if you don't specify a spot price in your config.

B) Unfortunately, no. There's some discussion on this in #199 and several issues linked to or from there.