kite-sdk / kite

Kite SDK
http://kitesdk.org/docs/current/
Apache License 2.0
394 stars 265 forks source link

Set avro schema configuration in format bundle #483

Open janvanbesien opened 5 years ago

janvanbesien commented 5 years ago

Rather than managing the avro reader schema configuration in the input format getSplits method, it needs to be managed when creating the format bundle. Otherwise a crunch pipeline that has multiple inputs (kite views) with different schemas will not see the correct reader schemas.

Note that the test only demonstrates the problem when also upgrading to crunch 0.13.0 (which is not part of this commit). This is due to CRUNCH-551 which is a fix for a problem in crunch that hides the current issue (at least in the scenario of the test) in versions before crunch-0.13.0.

A test was also added to verify the behaviour with plain map/reduce to ensure that this continues to work as expected.