waltligon / orangefs

Official repository for PVFS/OrangeFS
Other
62 stars 7 forks source link

default server configuration failed #84

Closed laiki closed 3 years ago

laiki commented 3 years ago

Hi,

I try to make use of OrangeFS. I downloaded the code, build and installed it, run the configuration generator without adapting any option and when I try to initialize the folder structure, I get a configuration error related to the Aliases :( <Aliases> Alias localhost tcp://localhost:3334 </Aliases> [S 03/08/2021 20:12:10] PVFS2 Server on node hostname version 2.9.7-orangefs-REV-65ab0d2 starting... [E 20:12:10.360120] Configuration file error. No host ID specified for alias hostname. [E 20:12:10.360191] Error: Please check your config files. [E 20:12:10.360214] Error: Server aborting. root@vmd62521:~#

the strace log is this

openat(AT_FDCWD, "/opt/orangefs/etc/orangefs-server.conf", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=1124, ...}) = 0 read(3, "<Defaults>\n\tUnexpectedRequests 5"..., 4096) = 1124 read(3, "", 4096) = 0 close(3) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2326, ...}) = 0 write(2, "[E 20:15:11.069403] Configuratio"..., 87[E 20:15:11.069403] Configuration file error. No host ID specified for alias hostname. ) = 87 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2326, ...}) = 0 write(2, "[E 20:15:11.069877] Error: Pleas"..., 59[E 20:15:11.069877] Error: Please check your config files. ) = 59 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2326, ...}) = 0 write(2, "[E 20:15:11.070242] Error: Serve"..., 44[E 20:15:11.070242] Error: Server aborting. ) = 44 exit_group(1) = ? +++ exited with 1 +++

I also tried different alias settings with the hostname only and with hostname and domain, but I always get the same error. The version I downloaded is http://download.orangefs.org/current/source/orangefs-2.9.8.tar.gz The system I'm running in is Ubuntu 20.04.

As I haven't found any user group or other kind of free support I hope you will be able to give me a hint here. Kind regards Wasili

dreynol commented 3 years ago

Hi Wasili,

From the error message, it looks like the hostname of the machine you are working on is actually just "hostname." Is that correct?

If that is the case, OrangeFS is looking for a "hostname" alias in the config file, but if you chose all the default values then the only alias is "localhost." There are two ways you can fix this:

1) When initializing the storage directories and starting the pvfs2-server, you can use the -a option to specify which alias to use. In this case, you would want to tell it to use "localhost" instead of the system's hostname, e.g. # /opt/orangefs/sbin/pvfs2-server -a localhost /opt/orangefs/etc/orangefs-server.conf -f

OR

2) Regenerate the config file but when you get to the "Enter hostnames" step, enter "hostname" (or whatever the hostname command returns) instead of choosing the default value of "localhost." Then you should be able to initialize the storage directories and start the pvfs2-server without needing to use the -a option. (NOTE: You can also edit the config file manually without needing to regenerate it as long as you make sure you change every instance of "localhost" to the system hostname.)

Hope this helps!

David

laiki commented 3 years ago

Hello David,

thanks for your reply. here some additional outputs of my tries to run the server `(base) root@NVidia-power:/opt/orangefs# cat etc/orangefs.conf | grep -i alias

Alias localhost tcp://localhost:3334

(base) root@NVidia-power:/opt/orangefs# hostname NVidia-power (base) root@NVidia-power:/opt/orangefs# ./sbin/pvfs2-server -f etc/orangefs.conf [S 03/09/2021 21:16:45] PVFS2 Server on node NVidia-power version 2.9.7-orangefs-REV-65ab0d2 starting... [E 21:16:45.100407] Configuration file error. No host ID specified for alias NVidia-power. [E 21:16:45.100423] Error: Please check your config files. [E 21:16:45.100430] Error: Server aborting.

(base) root@NVidia-power:/opt/orangefs# ./sbin/pvfs2-server etc/orangefs.conf

[S 03/09/2021 21:17:03] PVFS2 Server on node NVidia-power version 2.9.7-orangefs-REV-65ab0d2 starting... [E 21:17:03.025971] Configuration file error. No host ID specified for alias NVidia-power. [E 21:17:03.025985] Error: Please check your config files. [E 21:17:03.025991] Error: Server aborting. (base) root@NVidia-power:/opt/orangefs# ./sbin/pvfs2-server -a NVidia-power -f etc/orangefs.conf [S 03/09/2021 21:17:33] PVFS2 Server on node NVidia-power version 2.9.7-orangefs-REV-65ab0d2 starting... [E 21:17:33.781031] Configuration file error. No host ID specified for alias NVidia-power. [E 21:17:33.781047] Error: Please check your config files. [E 21:17:33.781056] Error: Server aborting. (base) root@NVidia-power:/opt/orangefs# ` BR Wasili As you can see the configuration is the default one, and any try to get the server running fails

laiki commented 3 years ago

BTW is there any user channel where such topics can be discussed?

dreynol commented 3 years ago

Hi Wasili,

When using the -a option, you need to specify the alias in the OrangeFS config file instead of the system's hostname. So in your case, to initialize the storage you need to change your command line to the following:

# ./sbin/pvfs2-server -a localhost -f etc/orangefs.conf

And to start the server:

# ./sbin/pvfs2-server -a localhost etc/orangefs.conf

And yes, there is an email list you can use for future questions like this: users@lists.orangefs.org

Don't hesitate to reach out if you need more assistance!

David

laiki commented 3 years ago

Hi Dave,

I changed the system and started from scratch. Now I get an error opening the db. `root@vmd62521:/opt/orangefs# ./sbin/pvfs2-server -a localhost -f etc/orangefs.conf [S 03/14/2021 21:31:25] PVFS2 Server on node localhost version 2.9.7-orangefs-REV-65ab0d2 starting... WARNING WARNING WARNING The MetadataStorageSpace path /opt/orangefs/storage/meta appears to be on the root device. It is recommended that the meta data be stored on a dedicated partition. *If you have a dedicated partition setup, please be sure it is mounted.

WARNING WARNING WARNING The DataStorageSpace path /opt/orangefs/storage/data appears to be on the root device. It is recommended that the data be stored on a dedicated partition. *If you have a dedicated partition setup, please be sure it is mounted.

[E 03/14/2021 21:31:25] TROVE:DBPF:Berkeley DB //opt/orangefs/storage/meta/storage_attributes.db failed to open[E 03/14/2021 21:31:25] Failure opening attribute database [E 03/14/2021 21:31:25] TROVE:DBPF:Berkeley DB //opt/orangefs/storage/meta/storage_attributes.db failed to open[E 03/14/2021 21:31:25] error: storage create failed; aborting! root@vmd62521:/opt/orangefs#`

might I need to install some Berkeley DB related stuff? BR Wasili

dreynol commented 3 years ago

Could you send the command line you used to configure OrangeFS? You can easily see this by changing to the top-level orangefs directory (wherever you downloaded the source code) and running the following command:

$ head config.log

Among other things, this will print the invocation command line used, which should look something like this:

$ ./configure <option1> <option2> <etc>

Can you paste that command here?

Thanks, David

laiki commented 3 years ago

Hello David,

I did not use any option besides --prefix

  $ ./configure --prefix=/opt/orangefs

## --------- ##
## Platform. ##
## --------- ##

hostname = vmd62521.contaboserver.net
uname -m = x86_64
uname -r = 5.4.0-65-generic
uname -s = Linux
uname -v = #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021

config.log I dumped the output of strace to the attached file. It shows that the file //opt/orangefs/storage/meta/storage_attributes.db cannot be opened. orangsfs.strace.txt This file is not existent

root@vmd62521:~/orangefs-v.2.9.8# ll //opt/orangefs/storage/meta/storage_attributes.db
total 20
drwx--x--x 2 root root      4096 Mar  7 13:48 ./
drwxr-xr-x 5 root root      4096 Mar  7 13:48 ../
-rw------- 1 root root 536870912 Mar  7 13:48 data.mdb
-rw------- 1 root root      8192 Mar  7 13:48 lock.mdb
root@vmd62521:~/orangefs-v.2.9.8# 

BR Wasili

dreynol commented 3 years ago

Thanks for the information, Wasili.

If no other options are specified, OrangeFS tries to use Berkeley DB by default. So yes, in that case you would need to have Berkeley DB installed. However, we HIGHLY recommend using LMDB instead. To tell OrangeFS to use LMDB as the backend database, you would use the following options:

$ ./configure --prefix=/opt/orangefs --with-db-backend=lmdb

laiki commented 3 years ago

ok will try, thanks

laiki commented 3 years ago

well, now I'm getting linker issues, even I installed the liblmdb-dev package running on Ubuntu 20.04

root@vmd62521:~/orangefs-v.2.9.8# make install
  LD            src/server/pvfs2-server
/usr/bin/ld: lib/libpvfs2-server.a(dbpf-db-bdb-server.o): in function `dbpf_db_open':
/root/orangefs-v.2.9.8/src/io/trove/trove-dbpf/dbpf-db-bdb.c:120: undefined reference to `db_create'
collect2: error: ld returned 1 exit status
make: *** [Makefile:929: src/server/pvfs2-server] Error 1
root@vmd62521:~/orangefs-v.2.9.8# apt list --installed | grep lmdb

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

liblmdb-dev/focal,now 0.9.24-1 amd64 [installed]
liblmdb0/focal,now 0.9.24-1 amd64 [installed]
lmdb-doc/focal,focal,now 0.9.24-1 all [installed,automatic]
root@vmd62521:~/orangefs-v.2.9.8# 
dreynol commented 3 years ago

It looks like it's still trying to use Berkeley DB. Just to be sure let's try to start fresh. Try these steps:

# make distclean
# ./prepare
# ./configure --prefix=/opt/orangefs --with-db-backend=lmdb
# make
# make install
laiki commented 3 years ago

Hello David,

sorry for the late response. I cleaned and rebuild orangefs with lmdb enabled. I generated the server configuration by using only defaults I started the server and it failed :( an strace showed me that there where problems accessing the meta data

stat("//opt/orangefs/storage/meta/storage_attributes.db", {st_mode=S_IFDIR|0711, st_size=4096, ...}) = 0
openat(AT_FDCWD, "//opt/orangefs/storage/meta/storage_attributes.db", O_RDWR) = -1 EISDIR (Is a directory)
[E 03/23/2021 16:50:58] TROVE:DBPF:Berkeley DB //opt/orangefs/storage/meta/storage_attributes.db failed to open[E 03/23/2021 16:50:58] Failure opening attribute database
[E 03/23/2021 16:50:58] TROVE:DBPF:Berkeley DB //opt/orangefs/storage/meta/storage_attributes.db failed to open[E 03/23/2021 16:50:58] error: storage create failed; aborting!
root@vmd62521:/opt/orangefs# ll storage/meta/
total 20
drwxr-xr-x 5 root root 4096 Mar  7 13:48 ./
drwxr-xr-x 4 root root 4096 Mar  7 13:48 ../
drwxr-xr-x 5 root root 4096 Mar  7 13:48 72dd4177/
drwx--x--x 2 root root 4096 Mar  7 13:48 collections.db/
drwx--x--x 2 root root 4096 Mar  7 13:48 storage_attributes.db/
root@vmd62521:/opt/orangefs# ll storage/meta/storage_attributes.db/
total 20
drwx--x--x 2 root root      4096 Mar  7 13:48 ./
drwxr-xr-x 5 root root      4096 Mar  7 13:48 ../
-rw------- 1 root root 536870912 Mar  7 13:48 data.mdb
-rw------- 1 root root      8192 Mar  7 13:48 lock.mdb
root@vmd62521:/opt/orangefs# 

as you can see above storage/meta/storage_attributes.db is not a file, but a diractiry. The strace shows, that the server is trying to open this path assuming it is a file.

I moved away the folder /opt/orange and run make install again. The strange thing is, that it compiled everything again like I haven't build it before. Maybe I had a type without noticing it. The fresh directory structure does not contain a subfolder called 'storage'. I generated the server configuration by applying only defaults, and let the server create the directory structure which again created the an folder called storage_attributes.db

root@vmd62521:/opt/orangefs# ./sbin/pvfs2-server -a localhost -f etc/orangefs-server.conf 
[S 03/23/2021 17:01:38] PVFS2 Server on node localhost version 2.9.7-orangefs-REV-65ab0d2 starting...
[D 03/23/2021 17:01:39] PVFS2 Server: storage space created.
[D 03/23/2021 17:01:39] Exiting.
root@vmd62521:/opt/orangefs# ls
bin  etc  include  lib  sbin  share  storage
root@vmd62521:/opt/orangefs# ls storage/
data  meta
root@vmd62521:/opt/orangefs# ls storage/meta/
3bfc8d37  collections.db  storage_attributes.db
root@vmd62521:/opt/orangefs# ls storage/meta/storage_attributes.db/
data.mdb  lock.mdb
root@vmd62521:/opt/orangefs# 

but this time starting the server did not report an error !?! :) So I killed it and started it with strace to see what happens, and it shows a different execution path than before

stat("//opt/orangefs/storage/meta/storage_attributes.db", {st_mode=S_IFDIR|0711, st_size=4096, ...}) = 0
getpid()                                = 466721
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbb53ddd000
mmap(NULL, 2101248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbb53bdc000
openat(AT_FDCWD, "//opt/orangefs/storage/meta/storage_attributes.db/lock.mdb", O_RDWR|O_CREAT|O_CLOEXEC, 0600) = 3

Maybe it was an issue caused by a typo when trying to have a clean build... Finally it is running and I can proceed testing it :) Thanks Wasili

dreynol commented 3 years ago

Hi Wasili,

Earlier I missed that it had actually created the storage directories when you first saw the Berkeley DB errors. Since the storage directories had already been created and initialized with BDB, that's why it kept trying to use BDB even after you reconfigured for LMDB. If the storage has already been created, then there is one more step that needs to be done in the cleanup process before rebuilding, and that is to remove the /opt/orangefs/storage directory after stopping the server(s).

That's why it worked after moving /opt/orangefs. Since /opt/orangefs/storage no longer existed, it rebuilt the storage, this time picking up the LMDB configuration. So you were on the right track! But for future reference, simply removing /opt/orangefs/storage will suffice. Then you can rebuild, install, initialize the storage, and start the server.

I'm glad you were able to get things running! As always, don't hesitate to reach out if you need anything else!

Thanks, David

dreynol commented 3 years ago

Just to give you a heads up, we recently enabled the new Discussions feature for the repo so I'm going to migrate this thread to a discussion.