Open HongW2019 opened 3 years ago
@zhztheplayer @weiting-chen @zhouyuan Please take a review, thanks a lot.
@HongW2019 If you have tested out the Spark TPCDS benchmark can run on GS. You can paste the script and result in this issue. Just for baseline.
This issue is depending on an upstream topic from Arrow community about GCS support https://issues.apache.org/jira/browse/ARROW-1231
@zhixingheyi-tian
Is your feature request related to a problem or challenge? Please describe what you are trying to do. When we enabled Gazelle on Google Cloud Dataproc with gs (Google Cloud Storage Buckets) for storage instead of HDFS, found that gs now wasn't supported.
Describe the solution you'd like Now ArrowDataSource supports S3, and Google Dataproc Spark supports gs for cloud storage. If we want to add the cloud storage supporting on on Dataproc clusters, we also need add the gs support for Google Cloud Storage.