Unblob does not support sparse tar archive because of the way the tarfile library handles those archives.
Our assumption is that the end of the archive corresponds to the last member offset + last member size, like so:
try:
tf = tarfile.TarFile(mode="r", fileobj=file)
except tarfile.TarError:
return -1
last_member = None
try:
for member in tf:
last_member = member
except (tarfile.TarError, SeekError):
# recover what's already been parsed
pass
if last_member is None:
return -1
last_file_size = round_up(last_member.size, BLOCK_SIZE)
end_of_last_tar_entry = last_member.offset_data + last_file_size
This assumption does not hold true with sparse archives since tarfile set the TarInfo size to the original file size within _proc_sparse:
def _proc_sparse(self, tarfile):
"""Process a GNU sparse header plus extra headers.
"""
# We already collected some sparse structures in frombuf().
structs, isextended, origsize = self._sparse_structs
del self._sparse_structs
# Collect sparse structures from extended header blocks.
while isextended:
buf = tarfile.fileobj.read(BLOCKSIZE)
pos = 0
for i in range(21):
try:
offset = nti(buf[pos:pos + 12])
numbytes = nti(buf[pos + 12:pos + 24])
except ValueError:
break
if offset and numbytes:
structs.append((offset, numbytes))
pos += 24
isextended = bool(buf[504])
self.sparse = structs
self.offset_data = tarfile.fileobj.tell()
tarfile.offset = self.offset_data + self._block(self.size)
self.size = origsize # <---- here
return self
We therefore need a way to know the sparse'd size if the archive is a sparse archive, not the original file size.
We can check if any member of the archive is sparse by using the issparse() function. I suppose supporting sparsed archive would mean browsing through every entry in the archive, not just the last member.
Unblob does not support sparse tar archive because of the way the
tarfile
library handles those archives.Our assumption is that the end of the archive corresponds to the last member offset + last member size, like so:
This assumption does not hold true with sparse archives since
tarfile
set theTarInfo
size to the original file size within_proc_sparse
:We therefore need a way to know the sparse'd size if the archive is a sparse archive, not the original file size.
We can check if any member of the archive is sparse by using the
issparse()
function. I suppose supporting sparsed archive would mean browsing through every entry in the archive, not just the last member.