salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 392 forks source link

Make AvroInOut#selectExistingPaths more robust and readable #483

Closed gerashegalov closed 4 years ago

gerashegalov commented 4 years ago

Describe the bug The logic for instantiating the filesystem object is unnecessarily complex. selectExistingPaths relies on the side effect of Path creation to detect that no paths were passed to begin with but after actually creating the filesytem object on the empty path.

Expected behavior Convert array of strings to array of Hadooo Path objects right away. This will already bail on the empty paths. Obtaining FS is just val fs = firstFile.getFileSystem(sc.hadoopConfiguration). Then filter as is.

sakhuja commented 4 years ago

@gerashegalov I can take a look at this. Thanks,

gerashegalov commented 4 years ago

resolved by #486