husky-team / husky

A more expressive and most importantly, more efficient system for distributed data analytics.
http://www.husky-project.com/
Other
98 stars 55 forks source link

File listing from folder path for InputFileFormat using NFSFileSplitter #309

Open aminmkhan opened 5 years ago

aminmkhan commented 5 years ago

HDFSInputFormat supports reading all files in the specified directory (#302). Does FileInputFormat with NFSFileSplitter also support loading data from a folder?

This can be useful for TF-IDF example, so that all input files from a folder are loaded. This would be similar behavior as for the TF-IDF example in Spark.

kygx-legend commented 5 years ago

It is not supported in the current version as we didn't have time to implement that. Not sure to support loading all sub-files only in one layer or scanning the whole folder recursively. In fact, #302 is a rough idea.