Closed robina80 closed 1 year ago
"robina80" == robina80 @.***> writes:
getting this error while im trying to index
Error statting file/dir goes here : No such file or directory
im getting quite a lot of these errors, what does it mean
Can you give us more details on your system please? What version of duc are you using? Can you run duc with debugging and only index the directory holding the file(s) or directories showing the problem? Is this file or directory viewable when you do 'ls -l' or does it show errors? Is this a local filesystem or a remote fileysstem mounted over NFS or some other protocol?
Please give us more details so we can help:
duc --version
duc -v index -d /tmp/duc.db /path/to/problem/directory
And if you like, you can send it to me directly if you don't want to share details.
John
thanks @l8gravely atm its still indexing, so let you know once done
under my /volumes i have mounted 3 storages to it
/volumes/pixit /volumes/DDN /volumes/hitachi
when i indexed /volumes/DDN it worked fine, i got no errors at all, it returned a new line once completed the index (ie no statting errors)
i could go on the cgi-bin website and i could the /volumes/DDN
bear in mind the pixit storage is the biggest out of them all (the one im getting statting errors), its 1.53PB and the index is at 2.4Pb and growing
the DDN is only 170TB and i think the index completed and said 183Tb
there all CIF shares ie smb and duc is version 1.4.4 by doing apt install duc on ubuntu 22.04 server LTS
"robina80" == robina80 @.***> writes:
thanks @l8gravely atm its still indexing, so let you know once done under my /volumes i have mounted 3 storages to it
/volumes/pixit /volumes/DDN /volumes/hitachi
when i indexed /volumes/DDN it worked fine, i got no errors at all, it returned a new line once completed the index (ie no statting errors)
i could go on the cgi-bin website and i could the /volumes/DDN
bear in mind the pixit storage is the biggest out of them all (the one im getting statting errors), its 1.53PB and the index is at 2.4Pb and growing
I think you mean the index for pixit is 2.4Gigabytes, or maybe Terabytes? If so... then I'd strongly suggest you split it up even more.
the DDN is only 170TB and i think the index completed and said 183Tb
Now the DDN is probably a Data Domain box, right? If so, it's also deduped, so I'm not sure if duc is the right tool here.
there all CIF shares ie smb and duc is version 1.4.4 by doing apt install duc on ubuntu 22.04 server LTS
Can you give me the output of 'ls -lh /path/to/dbs' as well please?
sorry @l8gravely i need to get back to you once its actually completed the indexing for pixit as atm its not, its just continuing to grow in size and never stop so i cant test out the database (as its giving those errors), what i may do is index one folder at a time on that share and see where the problem arises
I think you mean the index for pixit is 2.4Gigabytes, or maybe Terabytes? If so... then I'd strongly suggest you split it up even more.
its def said 2.4Pb
Now the DDN is probably a Data Domain box, right? If so, it's also deduped, so I'm not sure if duc is the right tool here.
what would be the correct tool please?
Can you give me the output of 'ls -lh /path/to/dbs' as well please?
il index the other good ones and let you know but as said the pixit one just doesnt complete the index so i cant show you
"robina80" == robina80 @.***> writes:
sorry @l8gravely i need to get back to you once its actually completed the indexing for pixit as atm its not, its just continuing to grow in size and never stop so i cant test out the database (as its giving those errors), what i may do is index one folder at a time on that share and see where the problem arises
John> I think you mean the index for pixit is 2.4Gigabytes, or maybe John> Terabytes? If so... then I'd strongly suggest you split it up even more.
its def said 2.4Pb
Ouch! That's big. How many files? You can get an idea with:
df -i /volumes/pixit
and see what it says, but since you don't give us any real details of your setup, it's hard to say.
John> Now the DDN is probably a Data Domain box, right? If so, it's also John> deduped, so I'm not sure if duc is the right tool here.
what would be the correct tool please?
Not sure honestly, because the Data Domain does compression and de-duplication. I'd login to the management page of the device and see what it says.
Duc will be at least a decent indication of where your largest chunks of data are stored.
Can you give me the output of 'ls -lh /path/to/dbs' as well please?
il index the other good ones and let you know but as said the pixit one just doesnt complete the index so i cant show you
Whenever you get a chance.
What type of system are you using to index? And how fast is the network connection to your backing storage?
I strongly suspect that you might be better off splitting /volumes/pixit/ down another level, and running multiple 'duc index ...' in parallel, assuming that your storage can handle the load, and that your indexing system can also handle the load.
Since duc (and most of the filesystem crawling tools) are single threaded, and really limited by how fast your backing filesystem is and how well it handles lots and lots of stat() calls, it's hard to speed things up unless you manually split things up.
With a 2+ Petabyte filesystem, I think you're going to do some splitting by hand.
John
ok this is interesting, i made another folder and mounted both mount points to it like so
/volumes/SAN/DDN /volumes/SAN/hitachi
when i indexed both no errors, it completed
root@duc01:~#
nice -20 ionice -c 2 -n 0 duc index -d /pie/.duc.db -p /volumes/SAN/
[--#-----] Indexed 560.8Tb in 5.7M files and 393.1K directories
root@duc01:~#
`
and i could iopen the web gui and i saw the pie chart so no problem
and when i did one folder from the pixit ie
/volumes/pixit/robs_test
root@duc01:~#
nice -20 ionice -c 2 -n 0 duc index -d /graph/.duc.db -p /volumes/pixit/robs_test/
[#-------] Indexed 1.2Gb in 46 files and 27 directories
root@duc01:~#
`
i then changed the cgi to point to graph instead of pie and i could see the pie chart on the web
i will do more digging and i think i will get to the bottom of this
thanks @l8gravely
"robina80" == robina80 @.***> writes:
ok this is interesting, i made another folder and mounted both mount points to it like so /volumes/SAN/DDN /volumes/SAN/hitachi
when i indexed both no errors, it completed
@.:[DEL:# nice -20 ionice -c 2 -n 0 duc index -d /pie/.duc.db -p /volumes/SAN/ [--#-----] Indexed 560.8Tb in 5.7M files and 393.1K directories @.::DEL]#
So that's alot of large large files spread across a bunch of directories and files. How large did the index get?
Also, I would suggest you use a seperate index for each directory tree you index. This makes it simpler to update down the line since you only need to index a sub-tree to a new DB name, then move that into the default name once it completes successfully.
So I would do:
duc index -d /pie/dbs/tmp-DDN.db -p /volumes/SAN/DDN & duc index -d /pie/dbs/tmp-hitachi -p /volumes/SAN/hitachi &
Then once a job finished and it has a good DB, just move it into /pie/dbs/DDN.db and then present the info using the index.cgi script that comes with the distribution.
and i could iopen the web gui and i saw the pie chart so no problem
Great!
and when i did one folder from the pixit ie
/volumes/pixit/robs_test
@.:[DEL:# nice -20 ionice -c 2 -n 0 duc index -d /graph/.duc.db -p /volumes/pixit/ robs_test/ [#-------] Indexed 1.2Gb in 46 files and 27 directories @.::DEL]#
i then changed the cgi to point to graph instead of pie and i could see the pie chart on the web
Great!
i will do more digging and i think i will get to the bottom of this
I think you're almost there.
thank @l8gravely it looks like im getting there
the trick is as you said, for all the pixit sub dirs (which is the biggest NAS out of the 3), index them seperately, like so and it works as i can see all the pixit sub dirs on the cgi web
root@duc01:~#
nice -20 ionice -c 2 -n 0 duc index -d /graph/.duc.db -p /volumes/pixit/_dc/
Error statting No such file or directory
Error statting No such file or directory
Error statting No such file or directory
Error statting No such file or directory
Error statting : No such file or directory
Error statting : No such file or directory
[--#-----] Indexed 579.3Tb in 5.7M files and 280.2K directories
root@duc01:~#nice -20 ionice -c 2 -n 0 duc index -d /graph/.duc.db -p /volumes/pixit/_source/ [------#-] Indexed 122.8Tb in 4.1M files and 33.1K directories root@duc01:~#
nice -20 ionice -c 2 -n 0 duc index -d /graph/.duc.db -p /volumes/pixit/_audio_services/
[-#------] Indexed 25.7Tb in 246.6K files and 8.4K directories
root@duc01:~#nice -20 ionice -c 2 -n 0 duc index -d /graph/.duc.db -p /volumes/pixit/robs_test/ [#-------] Indexed 1.2Gb in 46 files and 27 directories
root@duc01:~#`
and when i look at the cgi web interface i see them
Path Size Files Directories Date Time /volumes/pixit/_dc 579.3T 5704945 280231 2023-01-27 17:20:35 /volumes/pixit/_source 122.8T 4130084 33148 2023-01-28 11:26:06 /volumes/pixit/_audio_services 25.7T 246649 8443 2023-01-28 15:04:40 /volumes/pixit/robs_test 1.2G 46 27 2023-01-28 20:20:57
and the index sizes for all of them are
root@duc01:~#
du -sh /graph/.duc.db
79M /graph/.duc.db
root@duc01:~# du -sh /pie/.duc.db
61M /pie/.duc.db
root@duc01:~#
bear in mind the graph index is going to get bigger as i have alot more sub dirs to do
"robina80" == robina80 @.***> writes:
thank @l8gravely it looks like im getting there the trick is as you said, for all the pixit sub dirs (which is the biggest NAS out of the 3), index them seperately, like so and it works as i can see all the pixit sub dirs on the cgi web
@.***:[DEL:# nice -20 ionice -c 2 -n 0 duc index -d /graph/.duc.db -p /volumes/pixit/_dc/ Error statting No such file or directory Error statting No such file or directory Error statting No such file or directory Error statting No such file or directory Error statting : No such file or directory Error statting : No such file or directory
Did you edit this message? Because duc will show the name of the file/directory giving you problems. I'd look at them in detail because this is an error message from an lstat() call. So if it's a broken sym-link it's not really the end of the world... but is something you want to look into.
One thing I do on Netapp NFS filesystems is the '-e .snapshot' to exclude the snapshots from the indexing.
[--#-----] Indexed 579.3Tb in 5.7M files and 280.2K directories @.::DEL]# nice -20 ionice -c 2 -n 0 duc index -d /graph/.duc.db -p /volumes/pixit/_source/ [------#-] Indexed 122.8Tb in 4.1M files and 33.1K directories @.:[DEL:# nice -20 ionice -c 2 -n 0 duc index -d /graph/.duc.db -p /volumes/pixit/ _audio_services/ [-#------] Indexed 25.7Tb in 246.6K files and 8.4K directories @.::DEL]# nice -20 ionice -c 2 -n 0 duc index -d /graph/.duc.db -p /volumes/pixit/ robs_test/ [#-------] Indexed 1.2Gb in 46 files and 27 directories @.:~#
Are you really putting them all into the same DB?
and when i look at the cgi web interface i see them
Path Size Files Directories Date Time /volumes/pixit/_dc 579.3T 5704945 280231 2023-01-27 17:20:35 /volumes/pixit/_source 122.8T 4130084 33148 2023-01-28 11:26:06 /volumes/pixit/_audio_services 25.7T 246649 8443 2023-01-28 15:04:40 /volumes/pixit/robs_test 1.2G 46 27 2023-01-28 20:20:57
and the index sizes for all of them are
@.:[DEL:# du -sh /graph/.duc.db 79M /graph/.duc.db @.::DEL]# du -sh /pie/.duc.db 61M /pie/.duc.db @.***:~#
bear in mind the graph index is going to get bigger as i have alot more sub dirs to do
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.*Message ID: <zevv/duc/issues/306/1407479703@ github.com>
Did you edit this message? Because duc will show the name of the file/directory giving you problems. I'd look at them in detail because this is an error message from an lstat() call. So if it's a broken sym-link it's not really the end of the world... but is something you want to look into.
yes sorry as i work in media, we work on pre release material thats not even out yet so thats why, sorry
One thing I do on Netapp NFS filesystems is the '-e .snapshot' to exclude the snapshots from the indexing.
thats interesting, as im going to add this to my script, il show you once done
find /volumes/pixit/ -mindepth 1 -maxdepth 1 -type d | grep -v 'pxl-pfs01|.snapshots' > /scripts/pixit_dirs.txt
basically it will put all the dir names in the file and then it will run the indexes one each line of the file, excluding the names in grep
Are you really putting them all into the same DB?
yes, is this a bad idea then?
"robina80" == robina80 @.***> writes:
John> Did you edit this message? Because duc will show the name of the John> file/directory giving you problems. I'd look at them in detail John> because this is an error message from an lstat() call. So if it's a John> broken sym-link it's not really the end of the world... but is John> something you want to look into.
yes sorry as i work in media, we work on pre release material thats not even out yet so thats why, sorry
No problem, I can understand not wanting to share customer information unless absolutely necessary
John> One thing I do on Netapp NFS filesystems is the '-e .snapshot' to John> exclude the snapshots from the indexing.
thats interesting, as im going to add this to my script, il show you once done
Are you using Netapps for your backing storage? The '-e
find /volumes/pixit/ -mindepth 1 -maxdepth 1 -type d | grep -v 'pxl-pfs01|.snapshots' > /scripts/ pixit_dirs.txt
basically it will put all the dir names in the file and then it will run the indexes one each line of the file, excluding the names in grep
John> Are you really putting them all into the same DB?
yes, is this a bad idea then?
I would recommend that you put each of the above directories into it's own DB. And also create a temporary DB for each index. Then when done (you check the status code for '0' to mean it's ok) then you can move the new DB temporary DB into it's final name, so the web page will find it.
You do this because when you're indexing, the data returned by the web page isn't going to be correct, and will be changing all the time. So you just leave an old copy of the DB in place until the latest scan is done.
I tended to do my scans once a week across the couple of hundress TBs of data I had, across multiple different Netapps and different sites. I have 10tb volumes with 30 million files, so scanning takes a long time.
See my other earlier emails for examples.
John
Are you using Netapps for your backing storage? The '-e
yes, on one storage server i believe i have used the "-e" option before but tbh i find "find" with the "grep -v" option better
` cat /scripts/pixit.sh
find /volumes/pixit/ -mindepth 1 -maxdepth 1 -type d | grep -v 'pxl-pfs01|.snapshots' > /scripts/pixit_dirs.txt mkdir /pie while IFS= read -r line; do echo "$line" nice -20 ionice -c 2 -n 0 duc index -d /pie/.pixit.db -p /"$line"/ done < /scripts/pixit_dirs.txt `
ive run this once and now im re running it and i see its working as its re indexed "gfx" as you can see from the timestamp below in pic
https://i.postimg.cc/PJ0BWjZM/duc.png
i could create new indexes for all sub dirs but wouldnt i need cgi scripts for each sub dir or can i make one duc.cgi and in there put all my indexes in, like so
cat /var/www/html/cgi-enabled/pixit.cgi
duc cgi -d /pie/.gfx.db --gradient --list --tooltip duc cgi -d /pie/.dc.db --gradient --list --tooltip duc cgi -d /pie/.robs_test.db --gradient --list --tooltip ...
Looking good to me, and man you have a ton of data. Nice mix of large datasets in terms of size, but also smaller ones with lots of files. Hopefully 'duc' helps you target things better moving forward.
Cheers, John
thanks @l8gravely you have been great help in this matter, thank you so much for your time and input
hi all,
getting this error while im trying to index
Error statting file/dir goes here : No such file or directory
im getting quite a lot of these errors, what does it mean
thanks, rob