linkedin / isolation-forest

A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Other
220 stars 47 forks source link

Issue writing in synapse spark 3.2 #43

Open siege089 opened 6 months ago

siege089 commented 6 months ago

I'm using azure synapse and nothing I'm doing is allowing me to write models. I've explicitly included spark-avro in my pom file and loaded the spark-avro package into the spark pool workspace.

    <properties>
        <spark.version>3.2.0</spark.version>
        <scala.version.major>2.12</scala.version.major>
        <scala.version.minor>15</scala.version.minor>
    </properties>
    <dependencies>
        <dependency>
            <groupId>com.linkedin.isolation-forest</groupId>
            <artifactId>isolation-forest_${spark.version}_${scala.version.major}</artifactId>
            <version>3.0.3</version>
        </dependency>

        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-avro_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>com.microsoft.azure.synapse</groupId>
            <artifactId>synapseutils_${scala.version.major}</artifactId>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.jmockit</groupId>
            <artifactId>jmockit</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.scalatest</groupId>
            <artifactId>scalatest_${scala.version.major}</artifactId>
        </dependency>
    </dependencies>
2024-01-30 01:31:47,163 INFO ApplicationMaster [shutdown-hook-0]: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: org.apache.spark.sql.AnalysisException:  Failed to find data source: com.databricks.spark.avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".        
    at org.apache.spark.sql.errors.QueryCompilationErrors$.failedToFindAvroDataSourceError(QueryCompilationErrors.scala:1028)
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720)
    at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:876)
    at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:275)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:241)
    at com.linkedin.relevance.isolationforest.IsolationForestModelReadWrite$IsolationForestModelWriter.saveImplHelper(IsolationForestModelReadWrite.scala:262)
    at com.linkedin.relevance.isolationforest.IsolationForestModelReadWrite$IsolationForestModelWriter.saveImpl(IsolationForestModelReadWrite.scala:241)
    at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168)
jverbus commented 5 months ago

I just created a fix.

https://github.com/linkedin/isolation-forest/pull/44

jverbus commented 5 months ago

Try this

<dependency>
  <groupId>com.linkedin.isolation-forest</groupId>
  <artifactId>isolation-forest_3.2.4_2.12</artifactId>
  <version>3.0.4</version>
</dependency>
siege089 commented 4 months ago

Still getting the same error with this new version.