robinhood-suite / robinhood4

This repository holds the source code for robinhood version 4, a suite of tools to store and query any filesystem's metadata.
6 stars 3 forks source link

Admin Guide for Robinhood V4 #3

Open bom-bahadur opened 3 weeks ago

bom-bahadur commented 3 weeks ago

Hello there,

Would be great to have Admin Guide, just like we had in V3:

https://github.com/cea-hpc/robinhood/wiki/robinhood_v3_admin_doc#user-content-Database_setup_and_tuning

Many Thanks Bom Singiali

valeriyoann commented 3 weeks ago

Hello,

We will do that soon.

For now, you can use MongoDB to store data. To install it, follow these instructions for RHEL8/9: https://www.mongodb.com/docs/manual/tutorial/install-mongodb-on-red-hat/. When installed, you must start the database with systemctl start mongod.

Then you can just use RobinHood V4 to automatically create the database, for instance with rbh-sync: rbh-sync rbh:posix:/tmp rbh:mongo:test_db.

This will create the database and the collection at the same time, the collection being entries. So after the rbh-sync, you can do mongosh to check the database, and then:

use test_db
db.entries.find()

to show all entries in the database.

Alternatively, after the sync, you can do a rbh-find rbh:mongo:test_db to search all entries in the database through rbh-find.

Kind regards, Yoann Valeri

bom-bahadur commented 3 weeks ago

Hello Yoann,

Thanks for your quick response.

We will setup MongoDB to store data.

We have Rocky Linux 9.4 with lustre client (lctl 2.14.0_ddn154). And lustre file sytem is mounted at /ictstr01

Is below syntax correct ?

rbh:lustre:/ictstr01 rbh:mongo:/robinhood/test_db

/robinhood- local mount with SSD disks.

Many Thanks Bom Singiali

valeriyoann commented 3 weeks ago

For the lustre part, it's the good syntax.

For the Mongo part however, you only have to specify the name of the database in Mongo. Where that database is stored is put to Mongo however, but you can check it in the /etc/mongod.conf file. By default, databases/collections are stored in /var/lib/mongo, so you can just change that path to /robinhood/test_db.

Once you have made this change, the correct syntax for the whole command would be: rbh-sync rbh:lustre:/ictstr01 rbh:mongo:test_db

Kind regards, Yoann Valeri

bom-bahadur commented 2 weeks ago

Thanks Yoann,

Mongo DB has been setup. However, rbh-sync has error:

root@lustre-stats ~]# rbh-sync rbh:lustre:/ictstr01 rbh:mongo_db_lustre
rbh-sync: Cannot detect given backend: Invalid argument

[root@lustre-stats ~]# systemctl status mongod.service
● mongod.service - MongoDB Database Server
     Loaded: loaded (/usr/lib/systemd/system/mongod.service; enabled; preset: disabled)
     Active: active (running) since Wed 2024-11-13 15:16:03 CET; 3min 51s ago
       Docs: https://docs.mongodb.org/manual
   Main PID: 167328 (mongod)
     Memory: 215.2M
        CPU: 2.451s
     CGroup: /system.slice/mongod.service
             └─167328 /usr/bin/mongod -f /etc/mongod.conf

Nov 13 15:16:03 lustre-stats.scidom.de systemd[1]: Started MongoDB Database Server.
Nov 13 15:16:03 lustre-stats.scidom.de mongod[167328]: {"t":{"$date":"2024-11-13T14:16:03.963Z"},"s":"I",  "c":"CONTROL",  "id":7484500, "ctx":"main","msg":"Environment variable MONGODB_CONFIG_OVERRIDE_NOFORK == 1, overriding \"processManagement.fork\" >

Please advice, thanks.

Best Regards Bom Singiali

valeriyoann commented 2 weeks ago

Hello,

If I take the code snippet literally, there is a missing : between mongo and db_lustre for the second URI. The rbh-sync line should rather be:

rbh-sync rbh:lustre:/ictstr01 rbh:mongo:db_lustre

Kind regards, Yoann

bom-bahadur commented 1 week ago

Thanks, previous error has been resolved.

Here is input used and standard output from terminal:

[root@lustre-stats ~]# rbh-sync rbh:lustre:/ictstr01 rbh:mongo:mongo_db_lustre

Failed to stat '/boost_ai/users/test/bom.singiali/.testfile2.swp': No such file or directory (2)
Synchronization of '/ictstr01/boost_ai/users/test/bom.singiali/.testfile2.swp' skipped
valeriyoann commented 1 week ago

This just means that there was an error trying to stat that particular file. Perhaps during the scan, the file was removed from the directory, as in rbh-sync started scanning the directory the file is in, see that there is this file to scan, but when it actually starts to scan it (here, a stat), the file has been removed from the file system. The error is not critical, as written by rbh-sync, the entry was simply skipped.

And now you should have all the entries to tried to scan in your mongo database mongo_db_lustre (I've written above how to check them), assuming the rbh-sync is over of course.

bom-bahadur commented 1 day ago

Thanks Yoann,

Sync is on-going.

We have ~18 PiB (14 PiB used) in Lustre.

Metadata ingestion to MongoDB is only ~10 GB per day. Is this expected? Do you suggestion any optimizations to speed up this process? Thanks.

valeriyoann commented 1 day ago

How many inodes does your system have ? The raw capacity isn't a useful metric for RobinHood, the most relevant is the number of inodes.

If you want to know how many inodes RobinHood handles, you can check the database and use the command db.entries.count(). That will show you the number of inodes in the database, so if you do it once, note the number, wait an hour and do it again, you'll know how much inodes RobinHood can roughly handle.

Without those, I can't really tell you if what you get is expected or not.

However, if you want to speed up the process, you can either spawn multiple processes that will handle a subdirectory of your main system; or you can use the MPI File Utils backend.

For the former, you must use the branch feature of RobinHood V4, like this: rbh-sync rbh:lustre:<main_directory>#<sub_directory> rbh:mongo:<your_db>, and then start this command in background multiple times, changing the sub_directory each time.

For the latter, you have to use the lustre MPI backend of RBH V4: rbh-sync rbh:lustre-mpi:<directory> rbh:mongo:<your_db>. You must use this command behind a mpirun, for that I'll let you check the MPI File Utils documentation (https://mpifileutils.readthedocs.io/en/v0.11.1/index.html).

The first solution will be a single-node, multi-process improvement, while the second one is a multi-node, multi-process improvement, but requires additional architecture to work nicely.

Currently, those are the only two things I can suggest, working on the tools and the database to improve the speed are the next thing on our todo list after rbh-report is done.

Tell me if you need additional help :)