Closed robina80 closed 1 year ago
"robina80" == robina80 @.***> writes:
when indexing my storage my database is quite big (roughly 2TB), is there anyway to speed up the indexing, like adding more ram cpu to my vm?
This big limits are the speed of your storage and how fast it can accept someone stat()ing every single file and directory on there. I would suggest you split up your index into multiple sub-indexes. I've done this and it make a big difference since I could run three or four indexes in parallel without major impact on my backend storage system.
I then use a simple index.cgi script which dynamically builds a table of all the index DBs it finds in a directory to present to the user. It might be in the exmaples directory, or let me know and I'll send you a copy.
Do you have any rought numbers from your index? 2Tb is a big index. How long does it take to do a full index? How many files? How much data?
Unfortunately, doing parallel indexing into a single DB isn't supported, it's a hard problem to make sure loops and such don't happen. Much simpler to just run them in parallel on sub-directories.
John
ok so you suggest split my database files when im making it
atm i have mounted all my 3 storages under /volumes and in total, theres about roughly nearly 2PB on all 3
heres my script i made for it
`
while true; do
mkdir /pie touch /pie/.duc.db chmod -R 777 /pie nice -20 ionice -c 3 /usr/local/bin/duc index -d /pie/.duc.db -p /volumes/ sed -i -e 's/graph/pie/g' /var/www/cgi-bin/duc.cgi rm -rf /graph
sleep 4h
mkdir /graph touch /graph/.duc.db chmod -R 777 /graph nice -20 ionice -c 3 /usr/local/bin/duc index -d /graph/.duc.db -p /volumes/ sed -i -e 's/pie/graph/g' /var/www/cgi-bin/duc.cgi rm -rf /pie
sleep 4h
done `
i will kick it of now and let you know how big the index file gets
"robina80" == robina80 @.***> writes:
ok so you suggest split my database files when im making it
Yes, you should make one DB for each large directory. If you have three volumes under /volumes, then you should be doing:
for vol in one two three; do nice -20 ionice -c 3 duc index -d /db/new-$vol.db -q -x /volumes/$vol if [ $? != 0 ]; mv /db/$vvol.db /db/old-$vol.db mv /db/new-$vol.db /db/$vol.db fi done
This loops over the filesystems one at a time and only moves the current one out of the way if the new index works properly. This way people browsing via the web interface always see a working DB.
And with 2Pb of data, your scanning is the biggest time sink, so doing your indexes at night or outside of busy hours is a good thing to do.
atm i have mounted all my 3 storages under /volumes and in total, theres about roughly nearly 2PB on all 3
heres my script i made for it
`#!/bin/bash
while true; do
pie
mkdir /pie touch /pie/.duc.db chmod -R 777 /pie nice -20 ionice -c 3 /usr/local/bin/duc index -d /pie/.duc.db -p /volumes/ sed -i -e 's/graph/pie/g' /var/www/cgi-bin/duc.cgi rm -rf /graph
You are not doing the right thing here, you keep over-writing the same DB with new data, and I'm not sure what your sed command is trying to do.
sleep 4h
graph
mkdir /graph touch /graph/.duc.db chmod -R 777 /graph nice -20 ionice -c 3 /usr/local/bin/duc index -d /graph/.duc.db -p /volumes/ sed -i -e 's/pie/graph/g' /var/www/cgi-bin/duc.cgi rm -rf /pie
sleep 4h
done`
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.*Message ID: @.*** com>
hi all,
when indexing my storage my database is quite big (roughly 2TB), is there anyway to speed up the indexing, like adding more ram cpu to my vm?
thanks, rob