Closed vbanos closed 5 years ago
When debugging this improvement, I printed the values of self.f.tell()
and os.path.getsize(self.path)
and saw that they are really close but not exactly equal.
This may have something to do with the way the file offset and stat commands calculate file size.
This doesn't affect the correctness of this improvement as their difference is minimal.
SIZE 1966
TELL 1976
SIZE 2332
TELL 2342
SIZE 2995
TELL 3005
SIZE 3360
TELL 3370
SIZE 4025
TELL 4035
SIZE 4391
TELL 4401
...
...
SIZE 37605
TELL 37615
SIZE 38161
TELL 38171
SIZE 38668
TELL 38678
SIZE 47901
TELL 47911
SIZE 48422
TELL 48432
SIZE 49040
TELL 49050
SIZE 49536
TELL 49546
SIZE 75077
TELL 75087
SIZE 75589
TELL 75599
SIZE 145951
TELL 145961
SIZE 146466
TELL 146476
Thanks!
Every time we write WARC records to file, we call
maybe_size_rollover()
to check if the current WARC filesize is over the rollover threshold. We useos.path.getsize
which does a diskstat
to do that.We already know the current WARC file size from the WARC record offset (
self.f.tell()
). There is no need to callos.path.getsize
, we just reuse the offset info.This way, we do one less disk
stat
every time we write to WARC which is a nice improvement.