Open GeoffMontee opened 6 days ago
During a recent DynamoDB migration, $SPARK_HOME/work on the worker nodes grew to 479 GB, which filled up a 485 GB disk.
To try to work around this, we set:
export SPARK_WORKER_OPTS='-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=600'
For a description of the properties, see: https://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts
Interestingly, when this was retested with a smaller table, the $SPARK_HOME/work directory was 44 GB--the exact same size as the table:
$ du -sh /opt/spark/work/ 44G /opt/spark/work/ $ aws dynamodb describe-table --table-name=MY_TABLE | grep "TableSizeBytes" "TableSizeBytes": 44174574774,
Is the entire table being written to disk?
During a recent DynamoDB migration, $SPARK_HOME/work on the worker nodes grew to 479 GB, which filled up a 485 GB disk.
To try to work around this, we set:
For a description of the properties, see: https://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts
Interestingly, when this was retested with a smaller table, the $SPARK_HOME/work directory was 44 GB--the exact same size as the table:
Is the entire table being written to disk?