Open jon-zu opened 2 months ago
Just to check, is this timing coming from a --release
build?
Just to check, is this timing coming from a
--release
build?
Yes, I can see later today If I can make a small, reproducible example. I'm making a small tool to extract InstallShield installer, where I extract the .msi from and exe.
I've built a small example here. 7z can extract this in a few seconds, It seems the StreamReader
is slower than 1mb/s.
MSI file(~1gb): https://drive.google.com/file/d/1TqfK1UDhhv1pbfKZe0yWDAHZ-_azVsoT/view?usp=drive_link
Flamegraph for the example(I've disabled printing for that one):
It seems the svg is not really working here, I've also added It to the repo.
Hmm...it looks like the current implementation re-creates the Chain
struct every time it needs to refill the Stream
's internal buffer, and then throws the Chain
away again. And Chain::new
walks the entire FAT chain to the end, even if the caller only wants to read some piece of data from the beginning or middle of the chain. So we end up with this quadratic behavior where, to read all of a very long stream, we have to walk the same chain sectors over and over and over again.
Fixing this might not be straightforward; IIRC part of the reason the Stream
doesn't persist the Chain
is that sectors could potentially be added/removed while the Stream
is still open, and it should be robust against that.
One possibility to experiment with would be that perhaps the Chain
shouldn't be keeping a list of sector_ids
at all; it should just walk the FAT chain on demand when instructed to seek to a specific position in the chain.
I'm using the msi crate, but after some profiling, I think this crate might be the better place to report that.
Code:
Reading 50 Mb of that cab stream takes roughly one minute on my pc. I've also created a flame graph, but I'm not really sure where the problem lies. Is the loop in
Chain::new
taking a lot of time maybe?