uwescience / myria

Myria is a scalable Analytics-as-a-Service platform based on relational algebra.
myria.cs.washington.edu
Other
112 stars 46 forks source link

More modular Myria deployment config for HPC clusters like TX-E1 #839

Open dhutchis opened 8 years ago

dhutchis commented 8 years ago

HPC clusters like TX-E1 present a small handful of challenges. Items 1 through 3 are solved; this issue is about item 4.

  1. Only one "gateway/login" node can access the Internet. Solution: route all external traffic through the login node.
  2. Parallel file systems like Lustre don't support file locking. Solution: use local disks on the machines. Use scp to copy the myria jar and deployment to every worker node. Export GRADLE_USER_HOME to some place on the local disk to do the gradle build with file locking.
  3. YARN may not be installed. Luckily YARN can run in user mode; download the Hadoop binary, configure yarn-site.xml with an entry for yarn.resourcemanager.hostname, add nodes to the slaves file, and use the sbin/start-yarn.sh scripts. This works for multi-node setup too.
  4. Allow a per-worker configuration of Postgres. Allow each worker to use a separate username, password, and have additional parameters for JDBC such as SSL. Allow reading a password from a file.

There are steps we can take to solve item 4. We should expand the deployment.cfg file to include these options, and pass them down to the Workers and to the JdbcInfo classes.

Here is an example of the logic we might add to JdbcInfo's constructor, except that it shouldn't be hardcoded:

java.io.File f = new java.io.File("/home/gridsan/groups/databases/"+db+"/postgresql_user_password.txt");
try (java.io.BufferedReader r = new java.io.BufferedReader(new java.io.FileReader(f))) {
  this.properties.setProperty(JDBC_PROP_PASSWORD_KEY, r.readLine());
} catch (java.io.IOException e) {
  e.printStackTrace();
}
this.properties.setProperty("ssl","true");
this.properties.setProperty("sslfactory","org.postgresql.ssl.NonValidatingFactory");
senderista commented 8 years ago

For 2), it isn't necessary to copy the Myria jar or config file to anywhere but the coordinator. REEF takes care of deploying all necessary files and configuration to the workers. For 4), if we add more per-worker parameters, it might be time to switch to YAML. As for reading a password from a file, I think that's best handled with environment variables. We could discuss in a separate issue how to override settings in the Myria configuration file with environment variables.

dhutchis commented 8 years ago

Per-worker settings for 4) don't matter for the LocalRuntime mode, since it only runs on one node anyway. They ought to matter for the YarnClientConfiguration mode. Unfortunately I could only get Yarn working to the point that the Myria driver talks to Yarn, but no coordinators or workers seem to start.