Rather than managing the avro reader schema configuration in the input
format getSplits method, it needs to be managed when creating the format
bundle. Otherwise a crunch pipeline that has multiple inputs (kite views)
with different schemas will not see the correct reader schemas.
Note that the test only demonstrates the problem when also upgrading to
crunch 0.13.0 (which is not part of this commit). This is due to
CRUNCH-551 which is a fix for a problem in crunch that hides the current
issue (at least in the scenario of the test) in versions before crunch-0.13.0.
A test was also added to verify the behaviour with plain map/reduce to
ensure that this continues to work as expected.
Rather than managing the avro reader schema configuration in the input format getSplits method, it needs to be managed when creating the format bundle. Otherwise a crunch pipeline that has multiple inputs (kite views) with different schemas will not see the correct reader schemas.
Note that the test only demonstrates the problem when also upgrading to crunch 0.13.0 (which is not part of this commit). This is due to CRUNCH-551 which is a fix for a problem in crunch that hides the current issue (at least in the scenario of the test) in versions before crunch-0.13.0.
A test was also added to verify the behaviour with plain map/reduce to ensure that this continues to work as expected.