zio / zio-json

Fast, secure JSON library with tight ZIO integration.
https://zio.dev/zio-json
Apache License 2.0
410 stars 146 forks source link

Error when reading json array file? #1071

Open SimunKaracic opened 8 months ago

SimunKaracic commented 8 months ago

Specifically, this file https://github.com/statsbomb/open-data/blob/master/data/competitions.json The file is formatted as a json array, and I would like to read the file in a streaming fashion.

When opening the file with:

      json.readJsonAs(path)
        .tap(foo => ZIO.logInfo(foo.asArray.toString))
        .runCount

The entire file is read into a a single item, a json list (instead of providing a stream of each item in the list). It also throws this error, but seems to recover from it:

22:58:34.586 [zio-default-blocking-2] DEBUG zio.json.JsonDecoderPlatformSpecific -- timestamp=2024-02-22T22:58:34.583386+01:00 level=DEBUG thread=zio-fiber-7 message="Fiber zio-fiber-7 did not handle an error" cause=
zio.json.internal.UnexpectedEnd: if you see this a dev made a mistake using OneCharReader

When trying this to read the file as a Stream[Competition]

      ZStream
        .fromPath(path.toPath)
        .via(
          ZPipeline.utf8Decode >>>
            stringToChars >>>
            JsonDecoder[Competition].decodeJsonPipeline(JsonStreamDelimiter.Array)
        )
        .runCount

I get a StackOverflowError

23:00:26.273 [ZScheduler-Worker-9] DEBUG foo.bar.Main.run -- timestamp=2024-02-22T23:00:26.271057+01:00 level=DEBUG thread=zio-fiber-5 message="Fiber zio-fiber-5 did not handle an error" cause=
java.lang.StackOverflowError: null

The stack points directly to the derived Competition class json codec.

Class and codec:

    case class Competition(
        competition_id: Option[Int],
        season_id: Option[Int],
        country_name: Option[String],
        competition_name: Option[String],
        competition_gender: Option[String],
        competition_youth: Option[Boolean],
        competition_international: Option[Boolean],
        season_name: Option[String],
        match_updated: Option[String],
        match_available: Option[String]
    )

    object Competition {
      implicit val decoder: JsonDecoder[Competition] = DeriveJsonDecoder.gen[Competition]
    }

ZIO-json version: 0.6.2

SimunKaracic commented 8 months ago

Ok so this was a weird one. I was running all the code inside one file, in a Main.scala file. The stackoverflow error dissapears if I define the Competition class and decoder outside of the main object.

Only this remains, but the result seems to be fine:

20:10:09.146 [zio-default-blocking-2] DEBUG zio.json.JsonDecoderPlatformSpecific -- timestamp=2024-02-23T20:10:09.144538+01:00 level=DEBUG thread=zio-fiber-7 message="Fiber zio-fiber-7 did not handle an error" cause=
zio.json.internal.UnexpectedEnd: if you see this a dev made a mistake using OneCharReader
jdegoes commented 6 months ago

/bounty $100

algora-pbc[bot] commented 6 months ago

💎 $100 bounty • ZIO

Steps to solve:

  1. Start working: Comment /attempt #1071 with your implementation plan
  2. Submit work: Create a pull request including /claim #1071 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to zio/zio-json!

Add a bountyShare on socials

Andrapyre commented 6 months ago

@SimunKaracic , I cannot reproduce this error, either in tests or in the main class as you mentioned. Could you share the whole Main.scala file by any chance, as well as all relevant environment details regarding your platform, scala version, and zio version (thanks for mentioning the zio json version!).

SimunKaracic commented 4 months ago

I am also not able to reproduce the bug anymore, as I threw away the original exploratory code. I guess it would still be nice if we had something like this in zio-json, to support loading JSON arrays from files:

  def readJsonArrayAs[T: JsonDecoder](path: Path): ZStream[Any, Throwable, T] = {
    ZStream
      .fromPath(path)
      .via(
        ZPipeline.utf8Decode >>>
          stringToChars >>>
          JsonDecoder[T].decodeJsonPipeline(JsonStreamDelimiter.Array)
      )
  }

My attempt at reproducing the bug (also tried lowering versions of scala, but it didn't help): build.sbt

ThisBuild / version := "0.1.0-SNAPSHOT"

ThisBuild / scalaVersion := "3.4.2"

lazy val root = (project in file("."))
  .settings(
    name := "zio-json-reproduce",
    libraryDependencies ++= Seq(
      "dev.zio" %% "zio" % "2.1.5",
      "dev.zio" %% "zio-json" % "0.6.2"
    )
  )

Main.scala:

import zio.*
import zio.json.*
import zio.stream.*

import java.nio.file.{Path, Paths}

object Main extends ZIOAppDefault {

  case class Competition(
                          competition_id: Int,
                          season_id: Int,
                          country_name: String,
                          competition_name: String,
                          competition_gender: String,
                          competition_youth: Boolean,
                          competition_international: Boolean,
                          season_name: String,
                          match_updated: Option[String],
                          match_available: String
                        )

  object Competition {
    implicit val decoder: JsonDecoder[Competition] = DeriveJsonDecoder.gen[Competition]
  }

  private def stringToChars: ZPipeline[Any, Nothing, String, Char] =
    ZPipeline.mapChunks[String, Char](_.flatMap(_.toCharArray))

  val path = "competitions.json"
  val loadsWholeFileIntoArray: ZIO[Any, Throwable, Long] = json.readJsonAs(path)
    .runCount

  def readJsonArrayAs[T: JsonDecoder](path: Path): ZStream[Any, Throwable, T] = {
    ZStream
      .fromPath(path)
      .via(
        ZPipeline.utf8Decode >>>
          stringToChars >>>
          JsonDecoder[T].decodeJsonPipeline(JsonStreamDelimiter.Array)
      )
  }

  val iteratesThroughArrayOneByOne =  readJsonArrayAs(Paths.get(path)).runCount

  override def run: ZIO[Any with ZIOAppArgs with Scope, Any, Any] = {
    loadsWholeFileIntoArray.flatMap { c =>
      ZIO.logInfo(s"Array count: ${c}")
    } *> iteratesThroughArrayOneByOne.flatMap { c => ZIO.logInfo(s"Items inside array count: ${c}") }
  }
}