robmaz / distmap

Sequence alignment on Hadoop
0 stars 1 forks source link

Fix HADOOP_HOME & Co. #47

Closed robmaz closed 6 years ago

robmaz commented 6 years ago

Reminder for myself to go through this whole finding of hadoop binaries and hadoop binaries finding their configuration logic again. This is not helped by the fact that the hadoop "binaries" are shell scripts that call more shell scripts that all reset part of their environment, but necessarily fromt he same variables. In particular, HADOOP_PREFIX seems to be more commonly used now instead of HAOOP_HOME, which I suspect is an older variable. They also seem to have HADOOP_COMMON_HOME, now, although I am not clear on what that is supposed to be good for.

In an case, the basic idea is that distmap finds the hadoop binaries/jars from HADOOP_PREFIX or _HOME (maybe in that order of precedence), for which we also have command line arguments, which take precedence over the environment when set. The hadoop binaries/jars find (or should find, not exactly sure whether the jars need some java properties set instead) the cluster configuration via HADOOP_CONF_DIR, for which we also (should) have a command line argument. This is how one sends jobs to other clusters: copy its /etc/hadoop/conf or corresponding folder and pass its location via one of the options to distmap, from where the hadoop binaries pick it up.

robmaz commented 6 years ago

Seems to work now.