Currently, the output model is written out using Spark's saveAsTextFile() function. This does not actually write out a single file, it writes a file for each partition. To get a single file, we set the number of partitions to 1.
However, because this method is designed to write out files on HDFS, the "file" it writes out is actually a directory, with PART files with the actual output in it (In this case the data will be in PART-000). This is ugly, but I believe that if we directly use Scala's file writer, it does not work on HDFS.
We should find a way to write out a single file, on a typical filesystem or on HDFS
Currently, the output model is written out using Spark's saveAsTextFile() function. This does not actually write out a single file, it writes a file for each partition. To get a single file, we set the number of partitions to 1.
spark.sparkContext.parallelize(List(totalTimeString), 1).saveAsTextFile(outputFileDirectory + "/total_time.log") }
However, because this method is designed to write out files on HDFS, the "file" it writes out is actually a directory, with PART files with the actual output in it (In this case the data will be in PART-000). This is ugly, but I believe that if we directly use Scala's file writer, it does not work on HDFS.
We should find a way to write out a single file, on a typical filesystem or on HDFS