Open Gabriella439 opened 8 years ago
Yeah, this feature was somewhat recently (and hastily added). I don't think it makes a ton of sense to merge the read and write versions on this in one class that accepts a date range. Writing should not take a date range, it should take a specific date and a format string.
TimeSeqPathedSource
provides the following method so that you can override the period of hourly directories (i.e. one directory every 6 hours instead of every hour as the comment indicates):However, there's actually an inconsistency between the behavior of
allPaths
(which is used to generate the set of paths to validate) and the behavior ofhdfsWritePath
(which is used to generate the path to write out to).If you specify, say, a 6 hour window then
allPaths
will look for a directory at the start of each 6-hour window because it uses thestart
of theDateRange
to compute the path to validate:However,
writePathFor
will write out to a path at the end of the 6-hour window because it uses theend
of theDateRange
to compute the path to write out to:This means that if you write out data using
TimeSeqPathedSource
you must ensure thatdefaultDurationFor
matchesdateRange
supplied to aTimeSeqPathedSource
, otherwise you will not be able to read the data back in using the same source unless you disable validation.