Closed karims closed 6 years ago
You're looking for a reducer (groupBy is a particular reducer that also fills a map). The third of PFA's there method
options is a reducer mode. It still gives a score for each datum, one by one, but it does so in a way that accumulates). Search for references to tally
). It also forces you to write a combine
function that combines partial tallies, in case you're running Hadrian independently on many batches and need to combine partial results.
Unsure of where to post this question, I am asking it here. If there is a better forum, let me know as I could not join Slack.
I know PFA modelling is tied to individual datum. Is there a way to model on batch data? Like, one of my use case is taking CSV into spark data frame and doing a sort or groupBy. Is such an operation possible here?