Closed vlofgren closed 11 months ago
Oops. Thanks for reporting this! The same empty MessageBody instance was being reused which of course breaks if it gets closed.
There might be a secondary issue here in that jwarc currently assumes all revisit records have no payload but perhaps that's a false assumption.
Fix released as v0.28.5. It should sync to maven central in a couple of hours.
Thanks for the quick fix :D
I'll work around the revisit limitations now that I know they exist.
I'm running into an issue where the WarcReader behaves differently when reading the same file twice. It seems there is some global state that is not appropriately cleaned up perhaps. In the second pass, http().body().isOpen() returns false, when it returned true in the first pass; and attempting to read the data fails.
I'm on jwarc 0.28.4.
This is the smallest test case I've been able to produce: