wagoodman / dive

A tool for exploring each layer in a docker image
MIT License
45.99k stars 1.75k forks source link

Provide details on inefficient/wasted space #238

Open vertex-github opened 5 years ago

vertex-github commented 5 years ago

It would be really helpful if dive could provide more insight into its classification of wasted/inefficient space. I have images that dive claims have 250Mb of wasted space and Id love if dive could tell me why it thinks the space is wasted (the files and/or source layers and their respective sizes).

wagoodman commented 4 years ago

The lower left pane in the UI has a bit more detail than just the final count. Here's an example dive output, only the lower left pane:

[Image Details]─────────────────────────────────────────────
Total Image size: 363 MB                                    
Potential wasted space: 6.6 MB                              
Image efficiency score: 98 %                                

Count   Total Space  Path                                   
    4        2.3 MB  /var/cache/debconf/templates.dat       
    2        1.0 MB  /var/cache/debconf/templates.dat-old   
    4        815 kB  /var/log/dpkg.log                      
    2        562 kB  /var/cache/apt/pkgcache.bin            
    4        452 kB  /var/lib/dpkg/status                   
    2        425 kB  /var/cache/apt/srcpkgcache.bin         
    3        365 kB  /var/lib/dpkg/status-old               
    2        322 kB  /var/log/lastlog                       
    2         70 kB  /var/log/tallylog                      
...

In the above example, of the 6.6MB wasted space, dive claims that /var/cache/debconf/templates.dat is the worst offending file, showing up in 4 layers, accounting to 2.3 MB of the total wasted space. The second worst offending file is /var/cache/debconf/templates.dat-old... (and so on).

The unfortunate behavior with the UI is that there is currently no way to show output that is scrolled off the pane. However, if you run your dive command with --json output.json then you can capture the full list of files and details in an exported output.json file.

There is plenty of room for improvement for the UI, especially the image details pane. I'm open to suggestions for improvement!

twirrim commented 4 years ago

Maybe just a simple header to indicate what the file list is below? I didn't realise what that file listing was. At first glance it appeared to be "biggest files in the image", maybe?

hansbogert commented 3 years ago

How does it know these are needless files? Is this based onheuristics, i.e., a of patterns of files are known to be cruft? (or probably not needed in a final build/image?)

KUGA2 commented 2 years ago

How does it know these are needless files? Is this based onheuristics, i.e., a of patterns of files are known to be cruft? (or probably not needed in a final build/image?)

@wagoodman Could you please elaborate this? How do I see this list in the json file?

Suggestion for UI improvement: Use SHIFT+TAB (or CTRL) to tab into this pane and then scroll with Arrow keys. TAB should bring you back to the normal two panes.

HildaHay commented 1 year ago

I'd also like to see more info on this - the metrics are pretty ambiguous and their names aren't clear. I also suspect that they may be mis-counting some files as "wasted space" when they're not, but that's unavoidable. Knowing how the metrics are counted will help users figure out what they should set the metric thresholds at to account for inaccuracy, and also where to look for inefficiencies that might be missed by the scan.

GongT commented 1 month ago

The "Inefficient Files" section in CI mode may be too large, it generate >200kb output text in my case (using a archlinux official base image). It should limit max lines to show.