zevv / duc

Dude, where are my bytes: Duc, a library and suite of tools for inspecting disk usage
GNU Lesser General Public License v3.0
596 stars 79 forks source link

faster indexes - what cpu ram do i need #305

Closed robina80 closed 1 year ago

robina80 commented 1 year ago

hi all,

when indexing my storage my database is quite big (roughly 2TB), is there anyway to speed up the indexing, like adding more ram cpu to my vm?

thanks, rob

l8gravely commented 1 year ago

"robina80" == robina80 @.***> writes:

when indexing my storage my database is quite big (roughly 2TB), is there anyway to speed up the indexing, like adding more ram cpu to my vm?

This big limits are the speed of your storage and how fast it can accept someone stat()ing every single file and directory on there. I would suggest you split up your index into multiple sub-indexes. I've done this and it make a big difference since I could run three or four indexes in parallel without major impact on my backend storage system.

I then use a simple index.cgi script which dynamically builds a table of all the index DBs it finds in a directory to present to the user. It might be in the exmaples directory, or let me know and I'll send you a copy.

Do you have any rought numbers from your index? 2Tb is a big index. How long does it take to do a full index? How many files? How much data?

Unfortunately, doing parallel indexing into a single DB isn't supported, it's a hard problem to make sure loops and such don't happen. Much simpler to just run them in parallel on sub-directories.

John

robina80 commented 1 year ago

ok so you suggest split my database files when im making it

atm i have mounted all my 3 storages under /volumes and in total, theres about roughly nearly 2PB on all 3

heres my script i made for it

`

!/bin/bash

while true; do

pie

mkdir /pie touch /pie/.duc.db chmod -R 777 /pie nice -20 ionice -c 3 /usr/local/bin/duc index -d /pie/.duc.db -p /volumes/ sed -i -e 's/graph/pie/g' /var/www/cgi-bin/duc.cgi rm -rf /graph

sleep 4h

graph

mkdir /graph touch /graph/.duc.db chmod -R 777 /graph nice -20 ionice -c 3 /usr/local/bin/duc index -d /graph/.duc.db -p /volumes/ sed -i -e 's/pie/graph/g' /var/www/cgi-bin/duc.cgi rm -rf /pie

sleep 4h

done `

i will kick it of now and let you know how big the index file gets

l8gravely commented 1 year ago

"robina80" == robina80 @.***> writes:

ok so you suggest split my database files when im making it

Yes, you should make one DB for each large directory. If you have three volumes under /volumes, then you should be doing:

for vol in one two three; do nice -20 ionice -c 3 duc index -d /db/new-$vol.db -q -x /volumes/$vol if [ $? != 0 ]; mv /db/$vvol.db /db/old-$vol.db mv /db/new-$vol.db /db/$vol.db fi done

This loops over the filesystems one at a time and only moves the current one out of the way if the new index works properly. This way people browsing via the web interface always see a working DB.

And with 2Pb of data, your scanning is the biggest time sink, so doing your indexes at night or outside of busy hours is a good thing to do.

atm i have mounted all my 3 storages under /volumes and in total, theres about roughly nearly 2PB on all 3

heres my script i made for it

`#!/bin/bash

while true; do

pie

mkdir /pie touch /pie/.duc.db chmod -R 777 /pie nice -20 ionice -c 3 /usr/local/bin/duc index -d /pie/.duc.db -p /volumes/ sed -i -e 's/graph/pie/g' /var/www/cgi-bin/duc.cgi rm -rf /graph

You are not doing the right thing here, you keep over-writing the same DB with new data, and I'm not sure what your sed command is trying to do.

sleep 4h

graph

mkdir /graph touch /graph/.duc.db chmod -R 777 /graph nice -20 ionice -c 3 /usr/local/bin/duc index -d /graph/.duc.db -p /volumes/ sed -i -e 's/pie/graph/g' /var/www/cgi-bin/duc.cgi rm -rf /pie

sleep 4h

done`

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.*Message ID: @.*** com>