uber / RemoteShuffleService

Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Other
321 stars 100 forks source link

Fixed misleading error messages #49

Closed mayurdb closed 3 years ago

mayurdb commented 3 years ago

com.uber.rss.exceptions.RssException: Failed to get node data for zookeeper node: /spark_rss/phx3/default/nodes/agent-dedicated1257-phx3.prod.uber.internal

The current error message is misleading as it echoes that there are failure on the ZK end. Even though that could be the reason, these error are mainly causes by bad/lost RSS server node.

Also at the start of the map task, message points that task has started writing records which is not accurate. Fixing it to Started processing records in Shuffle Map Task