Recursive/Incremental file listing in HDFSInputFormat

In current version, HDFSInputFormat reads the first directory(path) only. For example, if the path is /data, it will list the directory of /data and read the items(must be file) like /data/a and /data/b.

In order to be more flexible, it could support reading an organized path recursively(all files are in the last directories). For example, if the data is stored as a time-based path like /data/year/month/dates/FILES, it prefers scanning all items in path '/data' rather than giving a concrete path '/data/year/month/dates`. Of course, we need to set the maximum recursive layers to avoid the tremendous reading.

husky-team / husky

Recursive/Incremental file listing in HDFSInputFormat #302