linkedin / brooklin

An extensible distributed system for reliable nearline data streaming at scale
BSD 2-Clause "Simplified" License
922 stars 137 forks source link

Use CONV to apply proper mysql chunking #983

Closed jogrogan closed 6 months ago

jogrogan commented 6 months ago

The MySQL [MD5()](https://www.w3resource.com/mysql/encryption-and-compression-functions/md5().php) function returns a binary string of 32 hex digits. MySQL MOD() expects two decimal numbers. If the md5 contains any hex character that is not a digit MOD() is always returning 0. This is causing a huge imbalance in data downstream since all of these records are getting bucketed under the task reading from 'partition' 0.

The solution is to apply MySQL CONV() from hex base 16 to decimal base 10

This was all tested locally via MySQL Workbench