Open saj9191 opened 6 years ago
Hi saj9191,
It seems like something changed in our side where we keep the maven artifacts, we'll fix it and update you here. Thanks for trying it out. Sorry for the inconvenience.
I am also having the same issue (also tried adding hadoop-lzo dependency manually to pom.xml with no success). Have there been any updates on resolving this issue?
We were also hitting this issue recently. I will get back with a fix soon and post it here. Thanks for taking your time to try it out.
I believe I have found a solution:
In spark-on-lambda/common/network-common/pom.xml
, add the following dependency (as suggested previously):
<dependency>
<groupId>com.hadoop.gplcompression</groupId>
<artifactId>hadoop-lzo</artifactId>
<version>0.4.19</version>
</dependency>
Then, in spark-on-lambda/pom.xml
, add the following repository (which "houses" hadoop-lzo
):
<repository>
<id>twitter</id>
<name>Twitter Repository</name>
<url>http://maven.twttr.com</url>
</repository>
After this, I ran the make-distribution.sh
command from your README and was able to build it all the way through.
Nice workaround! Let me also try it and update it.
Also may I know your use case for which you are trying it out or do you want to just try it out?
Thanks for working to update it!
We are working on a research project associated with using Lambda for what we call "interactive massively parallel" applications, and wanted to compare Spark-on-Lambda to current state-of-the-art, as well as our work!
By the way, from your blog post, do you have the data available that you use for sorting 100GB in under 10 minutes?
Interesting! Can you please elaborate a bit more on that? Btw the data is generated using Teragen utility from https://github.com/ehiggs/spark-terasort which you can use to generate the data.
You can view our work here: we call it gg, and while it was originally intended for compilation, it now supports general purpose applications (as simple as sorting and as complex as video encoding). Let me know if you have any questions about it (can be in a different forum instead of this issue thread)
I will try to run your sorting example and let you know if I have any issues!
Another easier workaround is to remove the pom.xml additions basically reverting the commit "Fix pom.xml to have the other Qubole repository location having 2.6.0... (2ca6c68ed5)"
Build your package using this command - ./dev/make-distribution.sh --name spark-lambda-2.1.0 --tgz -Phive -Phadoop-2.7 -DskipTests
And finally add the below jars to classpath before starting spark-shell
1. wget http://central.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar
2. wget http://central.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.3/hadoop-aws-2.7.3.jar
Refer here - https://markobigdata.com/2017/04/23/manipulating-files-from-s3-with-apache-spark/
hi, venkata91, I wrote you an email. I'm looking for an advisor for my startup. It is a spark-based web scraping service. The idea is to use this serverless computation but I'm having problems. As soon as you have time I would like to deepen it.
Hello, I'm trying to install spark on lambda. When I run
./dev/make-distribution.sh --name spark-lambda-2.1.0 --tgz -Phive -Phadoop-2.7 -Dhadoop.version=2.6.0-qds-0.4.13 -DskipTests
The Project Launcher fails and I get the following error.
[ERROR] Failed to execute goal on project spark-launcher_2.11: Could not resolve dependencies for project org.apache.spark:spark-launcher_2.11:jar:2.1.0: Failure to find com.hadoop.gplcompression:hadoop-lzo:jar:0.4.19 in https://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced -> [Help 1]
I tried to explicitly add hadoop-lzo as a dependency in the launcher pom.xml, but I still get the same error. Is there something I need to download or change to get this to work?
Thanks!