Closed alembiewski closed 5 years ago
Thanks, @alembiewski. Sorter and Generator look good and were battle-tested during scale-tests. However, we have CI tests failing and it looks like the failures caused by hadoop-aws
version bump which deprecates s3n
URLs. The culprit seems to be here: https://github.com/mesosphere/spark-build/blob/master/spark-testing/spark_s3.py#L9-L16. So we need to switch to s3a
URLs and modify tests as needed.
What changes were proposed in this pull request?
Resolves DCOS-59720 [DS] [Spark Operator] Create a better Spark Job for MWT
This PR introduces two Spark applications: 1)
DatasetGenerator
- creates a dataset with specified record count and record size and writes the result on s3 bucket. 2)DatasetSort
- reads data from s3 location and perform sort operation on the obtainedDataframe
How were these changes tested?
Release Notes
n/a