Closed khajaasmath786 closed 6 years ago
@khajaasmath786 hmm, not sure..it looks like a bug at first glance, let me try testing these files out. unfortunately i wont have time until tomorrow to test, but will get to it first thing tomorrow
@khajaasmath786 #169 fixes this. I have added tests to verify the fix, let me know if it works.
Hi Harsha,
I am going to use only the library supported with 2.1 version. is this fix applicable been updated in 2.1 or only applicable in 2.2 version of spark?
Thanks, Asmath
On Tue, Sep 19, 2017 at 9:01 PM, Ram Sriharsha notifications@github.com wrote:
Closed #167 https://github.com/harsha2010/magellan/issues/167 via #169 https://github.com/harsha2010/magellan/pull/169.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/harsha2010/magellan/issues/167#event-1256602059, or mute the thread https://github.com/notifications/unsubscribe-auth/AIWYMWoM-IOjIKfDGs1roOOuEqhm2Dz4ks5skHHqgaJpZM4PcsBC .
@khajaasmath786 do you mean 2.0.0 ? magellan master should work with 2.1+. I don;t know about Cloudera versions, but if this doesn't work on Apache Spark 2.1 + , let me know.
Hi Harsha,
I was able to run magellan earlier by adding --jars to spark2-shell using command below
spark2-shell --jars /hoome/yyy251/harsha2010:magellan:1.0.4-s_2.11
I cannot run by using the command below as mentioned by you in readme instructions of magellan. Please find below error. This is forcing me to get the latest version of the jar, download and push to cluster and access it using spark2-shell --jars /hoome/yyy251/harsha2010:magellan:1.0.4-s_2.11 instead of spark2-shell --packages /hoome/yyy251/harsha2010:magellan:1.0.4-s_2.11
I cannot confirm if the issue is resolved as packages are not downloaded directly. can you please share the latest jar for magellan:1.0.4-s_2.11 which resolves this issue.
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 2 | 0 | 0 | 0 || 1 | 0 |
---------------------------------------------------------------------
:: problems summary :: :::: WARNINGS module not found: commons-io#commons-io;2.4
==== local-m2-cache: tried
file:/home/yyy1k78/.m2/repository/commons-io/commons-io/2.4/commons-io-2.4.pom
-- artifact commons-io#commons-io;2.4!commons-io.jar:
file:/home/yyy1k78/.m2/repository/commons-io/commons-io/2.4/commons-io-2.4.jar
==== local-ivy-cache: tried
/home/yyy1k78/.ivy2/local/commons-io/commons-io/2.4/ivys/ivy.xml
-- artifact commons-io#commons-io;2.4!commons-io.jar:
/home/yyy1k78/.ivy2/local/commons-io/commons-io/2.4/jars/commons-io.jar
==== central: tried
https://repo1.maven.org/maven2/commons-io/commons-io/2.4/commons-io-2.4.pom
-- artifact commons-io#commons-io;2.4!commons-io.jar:
https://repo1.maven.org/maven2/commons-io/commons-io/2.4/commons-io-2.4.jar
==== spark-packages: tried
http://dl.bintray.com/spark-packages/maven/commons-io/commons-io/2.4/commons-io-2.4.pom
-- artifact commons-io#commons-io;2.4!commons-io.jar:
http://dl.bintray.com/spark-packages/maven/commons-io/commons-io/2.4/commons-io-2.4.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: commons-io#commons-io;2.4: not found
::::::::::::::::::::::::::::::::::::::::::::::
:::: ERRORS Server access error at url https://repo1.maven.org/maven2/commons-io/commons-io/2.4/commons-io-2.4.pom (javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target)
Server access error at url https://repo1.maven.org/maven2/commons-io/commons-io/2.4/commons-io-2.4.jar (javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target)
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: commons-io#commons-io;2.4: not found] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1078) at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:296) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:160) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [yyy1k78@brksvl168 ~]$
Hi Harsha,
I have downloaded the latest jar by spark-shell command. Here is the output of my spark-shell.
I went inside the jar file to check if DBReader.scala is updated for version 1.4.
C:\Users\yyy1k78.ivy2\cache\harsha2010\magellan\jars
Updated fix is not applied on the older version. can you confirm if I will be able to use it only in newer versions of magellan jar or any other work around to use it in 1.4 version of magellan?
Thanks, Asmath
can you use magellan_1.0.5 ? https://dl.bintray.com/spark-packages/maven
<dependency>
<groupId>harsha2010</groupId>
<artifactId>magellan</artifactId>
<version>1.0.5-s_2.11</version>
<type>jar</type>
</dependency>
Since you are using Spark 2.1, 1.0.5 should work Why do you need 1.0.4?
Let me try that.
BTW, I changed the shape files to geojson and it worked. I can see the output. Strange but want to try out and resolve it.
I have a question, what is maximum size of shapes that we can include in one geojson file?
On Thu, Sep 21, 2017 at 11:57 AM, Ram Sriharsha notifications@github.com wrote:
can you use magellan_1.0.5 ? https://dl.bintray.com/spark-packages/maven
harsha2010 magellan 1.0.5-s_2.11 jar
Since you are using Spark 2.1, 1.0.5 should work Why do you need 1.0.4?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/harsha2010/magellan/issues/167#issuecomment-331218035, or mute the thread https://github.com/notifications/unsubscribe-auth/AIWYMe_EHnVx2z5qnOxlW7tTtEA5oC8Xks5skpVzgaJpZM4PcsBC .
Hi Harsha,
I am trying to load shape file which has around 22 K shapes. It is resulting in exception. May I know is there any size limit while reading polygons within shape files?
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 51.0 failed 1 times, most recent failure: Lost task 0.0 in stage 51.0 (TID 411, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.io.Text.append(Text.java:237) at magellan.mapreduce.DBReader.initialize(DBReader.scala:132) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:180) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:177)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
I have also uploaded 3 shape files inside this github for your reference.
https://github.com/khajaasmath786/OozieSamples/tree/master/oozieProject/data/airawat-syslog
Thanks, Asmath