ukwa / ukwa-heritrix

The UKWA Heritrix3 custom modules and Docker builder.
9 stars 7 forks source link

Ensure partition offsets are being recorded properly #21

Closed anjackson closed 5 years ago

anjackson commented 5 years ago

Having just paused and restarted a large crawl, the partition offsets have all been reset. The KAFKA_SEEK_TO_BEGINNING flag is set to false, so this should not have occurred (and even then should not have occurred on pausing/unpausing the crawl.

Need to verify that the offsets are being committed - as we are now manually managing assignment, maybe this needs to be handled differently.

anjackson commented 5 years ago

Looking at the documentation we are using the 'Standalone Consumer' pattern, and it is implied that we must commit manually rather than being able to rely on the auto.commit behaviour. Not 100% clear though.

anjackson commented 5 years ago

Hmm, locally, couldn't force re-processing the queue on just pause/unpause, but stop and restart did do it because KAFKA_SEEK_TO_BEGINNING was true. Odd. Inspecting logs from the live system.

anjackson commented 5 years ago

I think I was misinterpreting the way they were being reported. The behaviour appears to be as expected now.