locationtech-labs / geopyspark

GeoTrellis for PySpark
Other
179 stars 59 forks source link

Re-add Support for Accumulo and HBase #690

Open jbouffard opened 5 years ago

jbouffard commented 5 years ago

Overview

Even though it's claimed in the docs, GPS doesn't support Accumulo or HBase currently because we have removed those dependencies from the backend.

Background

Originally, the GPS backend was depended on both geotrellis-accumulo and geotrellis-hbase in order to provide support to their respective backends. However, at some point we removed those dependencies, as we thought they weren't actually needed in order to interact with the given backend. We now that this is not the case, as anyone trying to access Accumulo or HBase will receive the following error message:

Py4JJavaError: An error occurred while calling None.geopyspark.geotrellis.io.AttributeStoreWrapper.
: java.lang.RuntimeException: Unable to find AttributeStoreProvider for accumulo://user:password@zoo-keeper:2181/instance
    at geotrellis.spark.io.AttributeStore$$anonfun$apply$3.apply(AttributeStore.scala:102)
    at geotrellis.spark.io.AttributeStore$$anonfun$apply$3.apply(AttributeStore.scala:102)
    at scala.Option.getOrElse(Option.scala:121)
    at geotrellis.spark.io.AttributeStore$.apply(AttributeStore.scala:102)
    at geotrellis.spark.io.AttributeStore$.apply(AttributeStore.scala:106)
    at geopyspark.geotrellis.io.AttributeStoreWrapper.<init>(AttributeStoreWrapper.scala:25)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

Solutions

There are a few ways we can resolve this issue.

Solution 1: Re-Add the Dependencies

The most straightforward and easiest way to solve this problem would be to re-add the geotrellis-accumulo and geotrellis-hbase dependencies to the backend. While this is the easiest and most surefire way to get support back, it also creates a needlessly large fat jar which is cumbersome and will contain features users don't need/want. This should not be our first choice.

Solution 2: Add Additional Jars that Contain These Dependencies

Another solution would be to create a set of jars for the user to pick from. These jars will have different levels of support: Base GPS, Base GPS + Accumulo, Base GPS + HBase, and Base GPS + Accumulo + HBase. These jars could be downloaded via the GPS CLI:

geopyspark install base-jar // GPS with no Accumulo/HBase support
geopyspark install-jar // GPS + Accumulo/HBase support
geopyspark install-accumulo-jar // GPS + Accumulo support
geopyspark install-hbase-jar // GPS + HBase support

We can also add new make commands as well for building the jar:

make build-base // GPS with no Accumulo/HBase support
make build // GPS with Accumulo/HBase support
make build-base-with-accumulo // GPS with Accumulo support
make build-base-with-hbase // GPS with HBase support

The only issue with this solution would be maintaining the seperate jars. However, it may be worth the cost as the users gets to choose what they want in a straightforward way.

Other Solutions

The two above methods are just a few ways we can resolve this issue. We should take the time discuss other possible solutions and their pros/cons here.

javyxu commented 5 years ago

Hello, The current version of geopyspark is 0.4.3. This error will still occur after the implementation of geopyspark install-jar. Is there a better solution?