spotify / scio

A Scala API for Apache Beam and Google Cloud Dataflow.
https://spotify.github.io/scio
Apache License 2.0
2.56k stars 513 forks source link

error message for partitionByKey makaes debugging a bit difficult #5468

Open snallapa opened 2 months ago

snallapa commented 2 months ago

when using partitionByKey if the pipeline encounters a key that is not in the input set. Right now you get:

org.apache.beam.sdk.util.UserCodeException: java.lang.IndexOutOfBoundsException: Partition function returned out of bounds index: -1 not in [0..2)

It would be great if the key that was not in the input set was printed, maybe even alongside the input set like keyset: [], missing key: []

kellen commented 2 months ago

For specifically partitionByKey we are providing the partition function so should be able to capture the key not found when index is returned as -1