Open torvalds-dev-testbot[bot] opened 2 months ago
Hello! 👋 Please wait while I search the knowledge base for an answer; this usually takes about 30 seconds
Yes, Hudi supports querying the dataset written using PySpark code, and the dataset can be queried via the Spark datasource like any other source unless Hive sync is enabled.
Hello! 👋 Please wait while I generate the follow up questions; this usually takes about 30 seconds
Yes, as per the documentation snippets provided, Hudi supports querying the dataset written via Pyspark using the Spark datasource if Hive sync is not enabled.
Please activate full access mode to enable this feature.
Yes, as per the documentation snippets provided, Hudi supports querying the dataset written via Pyspark using the Spark datasource if Hive sync is not enabled.
Ticket created successfully. Here is the link to the GitHub issue: https://github.com/numberlabs-developers/hudi/issues/255
Tips before filing an issue
Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
I am looking for resources on how to implement Hudi in PySpark for BigQuery sync. I cannot find any relevant information on this.
To Reproduce
Steps to reproduce the behavior:
1. 2. 3. 4.
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Hudi version :
Spark version :
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :
Running on Docker? (yes/no) :
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.