kestra-io / plugin-serdes

https://kestra.io/plugins/plugin-serdes/
Apache License 2.0
2 stars 6 forks source link

ParquetWritter didn't work with Kestra worker isolation configuration #62

Closed loicmathieu closed 10 months ago

loicmathieu commented 11 months ago

Expected Behavior

After enabling Java Security for worker isolation with Kestra EE (see https://kestra.io/docs/administrator-guide/configuration/enterprise-edition/workers), ParquetWritter didn't work anymore.

This can be reproduced with the following flow:

id: hello-parquet
namespace: company.team
tasks:
  - id: query-top-ten
    type: io.kestra.plugin.gcp.bigquery.Query
    sql: |
      SELECT DATETIME(datehour) as date, title, views FROM `bigquery-public-data.wikipedia.pageviews_2023` 
      WHERE DATE(datehour) = current_date() and wiki = 'fr' and title not in ('Cookie_(informatique)', 'Wikipédia:Accueil_principal', 'Spécial:Recherche')
      ORDER BY datehour desc, views desc
      LIMIT 10
    store: true
  - id: write-parquet
    type: io.kestra.plugin.serdes.parquet.ParquetWriter
    from: "{{outputs['query-top-ten'].uri}}"
    schema: |
      {
        "namespace": "example.avro",
        "type": "record",
        "name": "Wikipedia",
        "fields": [
            {"name": "date", "type": "string"},
            {"name": "title",  "type": "string"},
            {"name": "views", "type": "string"}
          ]
        }

With the following worker isolation configuration:

kestra:
  ee:
    java-security:
      enabled: true
      authorized-class-prefix:
        - io.kestra.plugin.serdes
        - io.kestra.plugin.gcp

There are multiple issues:

User support ticket: https://support.kestra.io/a/tickets/51

Steps To Reproduce

No response

Environment Information

Example flow

No response