[fsm-scala] Hadoop filesystem error

KipCrossing commented 3 years ago

When trying to run the processors, I get:

[error] (run-main-0) java.lang.IllegalArgumentException: Expected authority at index 7: hdfs://
[error] java.lang.IllegalArgumentException: Expected authority at index 7: hdfs://
[error]     at java.net.URI.create(URI.java:852)
[error]     at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:236)
[error]     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
[error]     at org.fsm.hfs.HFSUtils$.fileSystem(HFSUtils.scala:67)
[error]     at org.fsm.hfs.HFSUtils$.exisits(HFSUtils.scala:93)
[error]     at org.fsm.mapreduce.GridDataFolder.<init>(GridDataset.scala:41)
[error]     at org.fsm.Paddock.folder$lzycompute(Paddock.scala:19)
[error]     at org.fsm.Paddock.folder(Paddock.scala:19)
[error]     at org.fsm.Paddock.getAllDataAsRDD(Paddock.scala:23)
[error]     at org.fsm.processors.CLHCProcessor.$anonfun$build$11(CLHCProcessor.scala:40)
[error]     at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
[error]     at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
[error]     at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
[error]     at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:39)
[error]     at scala.collection.TraversableLike.map(TraversableLike.scala:237)
[error]     at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
[error]     at scala.collection.AbstractTraversable.map(Traversable.scala:108)
[error]     at org.fsm.processors.CLHCProcessor.build(CLHCProcessor.scala:38)
[error]     at RunProgram$.main(RunProgram.scala:57)
[error]     at RunProgram.main(RunProgram.scala)
[error]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error]     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error]     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error]     at java.lang.reflect.Method.invoke(Method.java:498)

My script:

import java.io.File
import java.net.URL

import geotrellis.vector.Polygon
import org.fsm.processors.{CLHCProcessor, LHCConfig, SampleAlignProcessor}
import org.fsm.{DataFormat, GridDataFile, NDVIFile, Paddock, PointDataset}
import org.geotools.geojson.feature.FeatureJSON
import org.locationtech.jts.geom.GeometryFactory
import org.opengis.feature.simple.SimpleFeature
import play.api.libs.json.Json

object RunProgram{
  def main(args: Array[String]): Unit = {
    println("It is running")
    val nirInput = new File(s"/home/kipling/Documents/fsm_sample_data/nir.tif").toURI().toURL()
    val nirData = GridDataFile(uid="nir", metricType="Average NDVI", file=nirInput, fileType=DataFormat.Tiff)

    val redInput = new File(s"/home/kipling/Documents/fsm_sample_data/red.tif").toURI().toURL()
    val redData = GridDataFile(uid="red", metricType="Average NDVI", file=redInput, fileType=DataFormat.Tiff)

    val elevationInput = new File(s"/home/kipling/Documents/fsm_sample_data/dem.tif").toURI().toURL()
    val elevationFile = GridDataFile(uid="dem_123", metricType="Elevation", file=elevationInput, fileType=DataFormat.Tiff)

    val ndviFile = NDVIFile(id="ndvi", red=redData, nir=nirData)

    val paddockJson = new FeatureJSON().readFeatureCollection(new File(s"/home/kipling/Documents/fsm_sample_data/paddocks.json").toURI().toURL().openStream())
    val samplePoints = Json.parse(new File(s"/home/kipling/Documents/fsm_sample_data/samples.json").toURI().toURL().openStream()).as[List[PointDataset]]

    val geomFactory = new GeometryFactory()

    val paddocks = paddockJson.toArray().flatMap{
      case sf:SimpleFeature if sf.getDefaultGeometry.isInstanceOf[Polygon] =>
        val bounds = sf.getDefaultGeometry.asInstanceOf[Polygon]
        println(bounds)
        val samplesInside = samplePoints.filter(s => bounds.contains(geomFactory.createPoint(s.location)))
        println(samplesInside)
        Some(
          Paddock(
            ndviFiles = List(ndviFile),
            otherGridFiles = List(elevationFile),
            bounds = bounds,
            soilPointDataArray = samplesInside,
            id = sf.getID
          )
        )
      case _ =>
        None
    }
    println(paddocks);
    println(paddocks.length);

    val lhcConfig = Option(LHCConfig(samples=4, metrics=Seq("Average NDVI", "Elevation"), perPaddock = false))
    val lhcRes = new CLHCProcessor().build(paddocks, lhcConfig)
    print(lhcRes)

  }

}

And intelliJ configs:

intelliJ_run_settings

correllink commented 3 years ago

Sample code above is not setting the configuration file before running.

Add a src/main/resources/fsm.conf

and make sure the sample application contains a line

import com.typesafe.config.ConfigFactory

org.fsm.config = ConfigFactory.parseResources("fsm.conf")

Before calling any processing functions.

Variables like HADOOP_EMBEDED=true are only needed if present in the config file.

See the src/test unit tests for a sample config file and fsm.tests.BaseSoiltechTest line:17 is where config is set

correllink commented 3 years ago

Will make an appropriate runtime 'Config not initialised' error so that its clear what is happening. Its best that a configuration be supplied rather than assuming a default set of values for the configuration such as working directories and ranges of soil properties.

KipCrossing commented 3 years ago

Good idea. We should also add instructions for the fsm.conf into the the readme

philmendel commented 3 years ago

Also here is an updated the RunProgram script to get it working. I inserted a line to run the DataGenerateProcess before doing the CLHC:

import java.io.File
import java.net.URL
import geotrellis.vector.Polygon
import org.fsm.processors.{CLHCProcessor, DataGenerateProcessor, LHCConfig, SampleAlignProcessor}
import org.fsm.{DataFormat, GridDataFile, NDVIFile, Paddock, PointDataset}
import org.geotools.geojson.feature.FeatureJSON
import org.locationtech.jts.geom.GeometryFactory
import org.opengis.feature.simple.SimpleFeature
import play.api.libs.json.Json
import com.typesafe.config.ConfigFactory

object RunProgram{
  def main(args: Array[String]): Unit = {
    org.fsm.config = ConfigFactory.parseResources("fsm.conf")

    val BASE_URL = "https://staging.farmlab.com.au/sample"

    //val nirInput = new File(s"/home/phil/Documents/fsm_sample_data/nir.tif").toURI().toURL()
    val nirInput = new URL(s"$BASE_URL/nir.tif")
    val nirData = GridDataFile(uid="nir", metricType="Average NDVI", file=nirInput, fileType=DataFormat.Tiff)

    //val redInput = new File(s"/home/phil/Documents/fsm_sample_data/red.tif").toURI().toURL()
    val redInput = new URL(s"$BASE_URL/red.tif")
    val redData = GridDataFile(uid="red", metricType="Average NDVI", file=redInput, fileType=DataFormat.Tiff)

    //val elevationInput = new File(s"/home/phil/Documents/fsm_sample_data/dem.tif").toURI().toURL()
    val elevationInput = new URL(s"$BASE_URL/dem.tif")
    val elevationFile = GridDataFile(uid="dem_123", metricType="Elevation", file=elevationInput, fileType=DataFormat.Tiff)

    val ndviFile = NDVIFile(id="ndvi", red=redData, nir=nirData)

    //val paddockInput = new File(s"/home/phil/Documents/fsm_sample_data/paddocks.json")
    val paddockInput = new URL(s"$BASE_URL/paddocks.json")
    val paddockJson = new FeatureJSON().readFeatureCollection(paddockInput.toURI().toURL().openStream())

    //val samplesInput = new File(s"/home/phil/Documents/fsm_sample_data/samples.json")
    val samplesInput = new URL(s"$BASE_URL/samples.json")
    val samplePoints = Json.parse(samplesInput.toURI().toURL().openStream()).as[List[PointDataset]]

    val geomFactory = new GeometryFactory()

    val paddocks = paddockJson.toArray().flatMap{
      case sf:SimpleFeature if sf.getDefaultGeometry.isInstanceOf[Polygon] =>
        val bounds = sf.getDefaultGeometry.asInstanceOf[Polygon]
        println(bounds)
        val samplesInside = samplePoints.filter(s => bounds.contains(geomFactory.createPoint(s.location)))
        println(samplesInside)
        Some(
          Paddock(
            ndviFiles = List(ndviFile),
            otherGridFiles = List(elevationFile),
            bounds = bounds,
            soilPointDataArray = samplesInside,
            id = sf.getID
          )
        )
      case _ =>
        None
    }
    println(paddocks)
    println(paddocks.length)

    val gridRes = new DataGenerateProcessor().build(paddocks)

    val lhcConfig = Option(LHCConfig(samples=4, metrics=Seq("Average NDVI", "Elevation"), perPaddock = false))
    val lhcRes = new CLHCProcessor().build(paddocks, lhcConfig)
    print(lhcRes)

  }

}

philmendel commented 3 years ago

Readme has also been updated with build.sbt changes and instructions for the fsm.conf

KipCrossing commented 3 years ago

Thanks, fo that @philmendel

Would it be possible to get an example using local files?

For example:

    val elevationInput = new File(s"/home/kipling/Documents/fsm_sample_data/dem.tif").toURI().toURL()
    val elevationFile = GridDataFile(uid="dem_123", metricType="Elevation", file=elevationInput, fileType=DataFormat.Tiff)

On first run, I get:

[error] (run-main-0) java.lang.ExceptionInInitializerError
[error] java.lang.ExceptionInInitializerError
[error]     at org.apache.spark.SparkContext.withScope(SparkContext.scala:699)
[error]     at org.apache.spark.SparkContext.sequenceFile(SparkContext.scala:1275)
[error]     at org.fsm.mapreduce.GridDataFolder.$anonfun$getAnyDataAsRDD$4(GridDataset.scala:265)
[error]     at scala.collection.immutable.List.map(List.scala:286)
[error]     at org.fsm.mapreduce.GridDataFolder.getAnyDataAsRDD(GridDataset.scala:263)
[error]     at org.fsm.processors.DataGenerateProcessor.$anonfun$build$3(DataGenerateProcessor.scala:70)
[error]     at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
[error]     at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
[error]     at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
[error]     at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:39)
[error]     at scala.collection.TraversableLike.flatMap(TraversableLike.scala:244)
[error]     at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241)
[error]     at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
[error]     at org.fsm.processors.DataGenerateProcessor.build(DataGenerateProcessor.scala:32)
[error]     at RunProgram$.main(RunProgram.scala:87)
[error]     at RunProgram.main(RunProgram.scala)
[error]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error]     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error]     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error]     at java.lang.reflect.Method.invoke(Method.java:498)
[error] Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.10.0
[error]     at com.fasterxml.jackson.module.scala.JacksonModule.setupModule(JacksonModule.scala:64)
[error]     at com.fasterxml.jackson.module.scala.JacksonModule.setupModule$(JacksonModule.scala:51)
[error]     at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
[error]     at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:808)
[error]     at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
[error]     at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
[error]     at org.apache.spark.SparkContext.withScope(SparkContext.scala:699)
[error]     at org.apache.spark.SparkContext.sequenceFile(SparkContext.scala:1275)
[error]     at org.fsm.mapreduce.GridDataFolder.$anonfun$getAnyDataAsRDD$4(GridDataset.scala:265)
[error]     at scala.collection.immutable.List.map(List.scala:286)
[error]     at org.fsm.mapreduce.GridDataFolder.getAnyDataAsRDD(GridDataset.scala:263)
[error]     at org.fsm.processors.DataGenerateProcessor.$anonfun$build$3(DataGenerateProcessor.scala:70)
[error]     at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
[error]     at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
[error]     at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
[error]     at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:39)
[error]     at scala.collection.TraversableLike.flatMap(TraversableLike.scala:244)
[error]     at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241)
[error]     at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
[error]     at org.fsm.processors.DataGenerateProcessor.build(DataGenerateProcessor.scala:32)
[error]     at RunProgram$.main(RunProgram.scala:87)
[error]     at RunProgram.main(RunProgram.scala)
[error]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error]     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error]     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error]     at java.lang.reflect.Method.invoke(Method.java:498)
[error] stack trace is suppressed; run 'last Compile / bgRun' for the full output

soiltechproject / fsm-docs

[fsm-scala] Hadoop filesystem error #13