mjakubowski84 / parquet4s

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
https://mjakubowski84.github.io/parquet4s/
MIT License
283 stars 65 forks source link

Reading from gcs bucket #321

Closed Nabil272 closed 12 months ago

Nabil272 commented 1 year ago

hi , im having problems reading a parquet from a gcs bucket , here is my code

    val hadoopConf = new Configuration()
    hadoopConf.set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
    hadoopConf.set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
    def sendParquet(path: String): Unit = {

      val options = ParquetReader.Options(hadoopConf= hadoopConf)
      ParquetReader
        .as[IntegrationParam].options(options)
        .read(Path(path))

        .foreach { x =>
          val message: String = integrationParamToJson(x)
          val key: String     = serKey(x)

          producer.send(
            new ProducerRecord[String, String](
              "integration-param",
              key,
              message
            )
          )
        }
    }

    sendParquet("gs://poc_scraper/i/case_studies/integrationParams.parquet/part-00000-9d00bdec-d3c5-4aa9-b980-0376c5399a05-c000.snappy.parquet")

i added my GOOGLE_CREDENTIALS as an env variable

mjakubowski84 commented 1 year ago
  1. You didn't describe the error you are getting, so it is impossible to help you
  2. GCS connector and Hadoop client are not an intrinsic part of Parquet4s, so you should seek help in respective places, like https://stackoverflow.com/search?q=gcs+connector