TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Describe the bug
The logic for instantiating the filesystem object is unnecessarily complex. selectExistingPaths relies on the side effect of Path creation to detect that no paths were passed to begin with but after actually creating the filesytem object on the empty path.
Expected behavior
Convert array of strings to array of Hadooo Path objects right away. This will already bail on the empty paths. Obtaining FS is just val fs = firstFile.getFileSystem(sc.hadoopConfiguration). Then filter as is.
Describe the bug The logic for instantiating the filesystem object is unnecessarily complex.
selectExistingPaths
relies on the side effect of Path creation to detect that no paths were passed to begin with but after actually creating the filesytem object on the empty path.Expected behavior Convert array of strings to array of Hadooo Path objects right away. This will already bail on the empty paths. Obtaining FS is just
val fs = firstFile.getFileSystem(sc.hadoopConfiguration)
. Then filter as is.