zz85 / space-radar

Disk And Memory Space Visualization App built with Electron & d3.js
https://github.com/zz85/space-radar-electron/releases
1.38k stars 80 forks source link

Offline file read #19

Closed joerg closed 8 years ago

joerg commented 8 years ago

A friend of mine just recently mentioned that he needed to analyze the disk contents of a remote (headless) server and that there are practically no tools out there [1] to do this. As it is, I also need to do this regularly and when I stumbled upon space-radar I decided to try and get this working. So, with this feature you can create a disk map with du -ab /usr /var | gzip -c > /tmp/sizes.txt.gz on your remote server, transfer the file to your computer and analyze it there. No messy and most probably impossible sshfs or anything the like needed. What do you think?

[1] There are xdu and xdiskusage, but both are unmaintained, old and, if you ask me, ugly.

zz85 commented 8 years ago

This looks good! Let me give this a spin :)

zz85 commented 8 years ago

I actually like this idea :+1:

I'm also curious to know whether you prefer using the du or find command and how long that usually take?

Also, since I'm not on linux, these command seems to work for me instead.

# du on mac doesn't support -b
du -a path | gzip -c > sizes.gz
# find on mac doesn't support -printf
find path -type f -exec stat -f "%z %N" {} \;
joerg commented 8 years ago

Initially I preferred find because I thought it is more likely to be installed on all Linux/Unix servers, but a friend of mine gave me two reasons why du might be better:

I did not test it on Mac, but reading the man pages now it seems that mac's du by default returns the file size in bytes which makes the "-b" option unnecessary. You are right, this has to be documented. Since I don't have a mac (or actually I have one, but it is running linux) I think it would be best if you test and document the commands. IIRC Mac mostly uses BSD tools/packages, so if Mac is documented, this pretty much also covers other Unixes. @sometimesfood: Maybe you want to shed some light concerning "du vs. find" and maybe also BSDs etc.

sometimesfood commented 8 years ago

@joerg Sure.

As @zz85 mentioned, -printf is a GNU extension to find(1). Of course the same thing can be emulated using -exec stat, but I prefer to use du(1) instead for the following reasons:

reading the man pages now it seems that mac's du by default returns the file size in bytes which makes the "-b" option unnecessary

Unfortunately, things are a little bit more complicated than that. On old Unix systems, du used to return the number of used file system blocks. On modern operating systems, du does not report the actual number of file system blocks, but instead uses a fixed block size of 512 bytes (according to POSIX). However, on GNU systems, this block size defaults to 1024 bytes. The block size used can be set to any value larger than 512 by setting the environment variable BLOCKSIZE. Also, all du implementations known to me support the -k parameter, which sets the block size to 1024.

I think the best course of action would probably be to rely on du's -k parameter and assume that input files use a block size of 1024. Also, it would be a good idea to use the disk usage numbers reported by du for directories instead of manually summing up disk usage for all child nodes in order to prevent the block size from impeding the accuracy of reported disk usage numbers for directories that contain a lot of files.

sometimesfood commented 8 years ago

Btw, I really appreciate you guys working on this. I use xdu quite a lot for this type of task and space-radar looks like a great alternative. :+1:

In case you would prefer to keep the user interface lean, you could also just read the disk usage file from standard input instead of adding another UI element.

My current workflow looks something like this:

# on a headless remote system behind a firewall
~ % sudo du -ak /home | gzip > home.du.gz

# on my workstation at home
~ % zcat home.du.gz | xdu
zz85 commented 8 years ago

Thanks @sometimesfood for the inputs!

Just for interest sake, I just tried this on my mac, and the find command does seem takes much longer.

$ time du -ak .  | gzip -c > sizes.gz
du -ah .  1.93s user 33.66s system 67% cpu 52.574 total
gzip -c > sizes.gz  1.59s user 0.09s system 3% cpu 52.575 total

$ time find . -type f -exec stat -f "%z %N" {} \; | gzip -c > sizes2.gz
find . -type f -exec stat -f "%z %N" {} \;  385.12s user 813.68s system 92% cpu 21:32.84 total
gzip -c > sizes2.gz  7.91s user 4.88s system 0% cpu 21:32.84 total

I notice that du -a returns in blocks instead of bytes too. We can multiply the 1024 block size inside SpaceRadar if agree that it would take in a file of the format block size file\n?

joerg commented 8 years ago

Yes, I think using du -ak by default and doing the multiplication in space radar would be best for now. If we document the file format with <block size> <path> then anyone who feels like using any other tool is also welcome. For the issue with the directory sums: Currently reading the du output delivers a json object of this format

{
  "name": "/",
  "size": 123,
  "children": [ ... ]
}

So basically it should be possible to simply skip calculating the directory size (I think it is the sum variable) if "size" and "children" exist.

zz85 commented 8 years ago

Yes, I think using du -ak by default and doing the multiplication in space radar would be best for now

sounds good to me!

So basically it should be possible to simply skip calculating the directory size

Sometime in future I might decide to change the json structure (either use FlatBuffers or some other Data Structure) since big json takes alot of memory in v8. For now, I think I'm fine with it.

In case you would prefer to keep the user interface lean, you could also just read the disk usage file from standard input instead of adding another UI element.

I like this suggestion too. Piping from a file or std in seems to be more for sysadmins and advanced users which most user wouldn't understand what "Read File" means. I would propose accepting both std in or open the file thru the application's context menu.

@joerg Let me know when you feel this Pull Request is ready and I'll merge this :)

joerg commented 8 years ago

Hi,

I just updated the docs and the block size of 1024 as suggested. From my perspective everything that should be in this particular pull request is now ready. Skipping the directory size calculation and things like are another topic and not urgent for this.

Btw. I also tried packaging for Linux and it worked like expected, except for the memory reading, but I did not look into that closer. Maybe you want to have a Linux release for the next version. This is the command I used (copy of the build:win with platform and version changed): electron-packager . SpaceRadar --platform=linux --arch=x64 --version=0.36.10 --ignore=experiments --ignore=node_modules/electron-packager --ignore=node_modules/electron-prebuilt --ignore=node_modules/standard --ignore=node_modules/publish-release --version-string.CompanyName=zz85 --version-string.ProductName=SpaceRadar --icon=Icon.icns

zz85 commented 8 years ago

Hi @joerg - thanks for the update. There's a couple more things as mentioned in this PR (and probably disabling file manipulation when in file read mode).

I'll merge first and follow up in a separate PR. If you are interested to contribute more, I'll be happy to add you as a collaborator on this project too!