uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

OSError: Unable to get namenodes for default service #404

Open quiescentsam opened 5 years ago

quiescentsam commented 5 years ago

Hi,

Traceback (most recent call last): File "", line 1, in File "/petastorm_venv3.6/lib/python3.6/site-packages/petastorm/reader.py", line 120, in make_reader resolver = FilesystemResolver(dataset_url, hdfs_driver=hdfs_driver) File "/petastorm_venv3.6/lib/python3.6/site-packages/petastorm/fs_utils.py", line 96, in init nameservice, namenodes = namenode_resolver.resolve_default_hdfs_service() File "/petastorm_venv3.6/lib/python3.6/site-packages/petastorm/hdfs/namenode.py", line 124, in resolve_default_hdfs_service .format(default_fs))) OSError: Unable to get namenodes for default service "hdfs://master:8020" from Hadoop path /opt/cloudera/parcels/CDH/lib/hadoop in environment variable HADOOP_HOME! Please check your hadoop configuration!

selitvin commented 5 years ago

Can you please confirm that your HADOOP_HOME environment variable points to a valid hadoop installation directory, specifically that $HADOOP_HOME/etc/hadoop/hdfs-site.xml and $HADOOP_HOME/etc/hadoop/core-site.xml are valid. Does using hdfs dfs -ls / works for you from the command line (and $HADOOP_HOME is set to the same value as when you are running your python program that uses petastorm)?

filipski commented 4 years ago

I face the same issue now. My HADOOP_HOME is set correctly:

$ echo $HADOOP_HOME
/usr/local/hadoop

I have /usr/local/hadoop/etc/hadoop/hdfs-site.xml configured as well. I set HADOOP_CONF_DIR and SPARK_DIST_CLASSPATH in /usr/local/spark/conf/spark-env.sh as follows:

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop/
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

And surely hdfs dfs -ls / works fine:

$ hdfs dfs -ls /
Found 2 items
drwxrwx---   - hduser hadoop              0 2020-01-23 09:04 /tmp
drwxr-xr-x   - hduser supergroup          0 2020-01-16 07:52 /user

I tried to store the 'hello world' dataset on HDFS, simply by changing the output_url in https://github.com/uber/petastorm/blob/e8b9f74c8db63f74c2f3b1658829089ee2d2ccdf/examples/hello_world/petastorm_dataset/generate_petastorm_dataset.py#L43 to:

def generate_petastorm_dataset(output_url='hdfs:///tmp/hello_world_dataset'):

As you see from the hdfs dfs -ls / above, /tmp exists on the HDFS and has correct access rights to the hadoop group which I'm using to start generate_petastorm_dataset.py.

What else am I missing?

filipski commented 4 years ago

OK, I took a closer look at: https://github.com/uber/petastorm/blob/e8b9f74c8db63f74c2f3b1658829089ee2d2ccdf/petastorm/hdfs/namenode.py#L110 and consequently: https://github.com/uber/petastorm/blob/e8b9f74c8db63f74c2f3b1658829089ee2d2ccdf/petastorm/hdfs/namenode.py#L84

and it looks petastorm is coded to work on high-availability (HA) cluster only, as it requires non-empty list of namenodes from 'dfs.ha.namenodes.' Hadoop configuration.

My cluster is a simple sandbox installation with a single namenode. Do I have to configure HA cluster or is there a way to use petastorm on a simple cluster with just a single name node?

selitvin commented 4 years ago

Well, not on purpose. In our clouds we have only HA ones. I guess our options are:

filipski commented 4 years ago

I actually went ahead and reconfigured the cluster into a proper HA one, as at the end we'd need it this way anyway. And I confirm that with that config everything works well. Actually, one thing worth mentioning and updating your documentation: storing to HDFS by default requires libhdfs3 and if it's missing one gets pretty cryptic exceptions, as you catch the one clearly saying that the lib is missing and raise your own. So, consider mentioning that dependency in the installation section of your documentation or make a dependency for automatic installation, if that makes sense.

selitvin commented 4 years ago

Thank you for the feedback. Will leave the ticket open to track documentation update and to improve error messages in this scenario.

msaisumanth commented 4 years ago

@selitvin I had the same problem. I am using a docker image which is running a HDFS cluster. I set the values for dfs.ha.namenodes but it doesn't change anything. Any ideas?

selitvin commented 4 years ago

The code will try to load configuration from $HADOOP_HOME/etc/hadoop/, $HADOOP_PREFIX/etc/hadoop/ and $HADOOP_INSTALL/etc/hadoop/ location ( in this order ). Is it possible that we don't find the right hdfs-site.xml, core-site.xml files?