Closed jindrahelcl closed 6 years ago
Why is the (<preprocessor>, "source")
in this order and not ("source", <preprocessor>)
?
I think that it would be clearer to have ('input_file/input_series', 'processor/reader')
Because both reader and processor are Callable
s and their types can overlap. (Readers are Callable[[List[str]], Iterator[Any]]
, and preprocessors are Callable[[List[Any], List[Any]]
, so there is overlap, for example for Callable[[List[str]], List[str]]
.)
When you want to recognize which is which according to its type (without having a parent class for Reader
and Preprocessor
), you need to switch the ordering, otherwise they are both Tuple[str, Callable]
.
... and you don't want to start inheriting these things, because it's not that simple because all the invariance/covariance and whatnot.
I fixed the errors and removed the redundant lazy
parameter. Now, everytime the buffer_size
is specified, the dataset will behave lazily (předtim to spadlo, když lazy nebylo true a naopak když lazy bylo true a nebyl specifikovanej buffer size)
Note that the data are actually stored in memory and are not re-read from the files, if the buffer_size is None. This is the expected behavior.
This PR introduces a new way how to construct a dataset:
Details:
buffer_size
determines how much of the data is being pre-fetched. In non-lazy dataset, this is always the size of the data.buffer_size
should be bigger thanbatch_size
to avoid a warning.neuralmonkey.writers.auto.AutoWriter
, which preserves the functionality from the old codebase, which automatically selects the suitable writer given the type of the data being outputted.load_dataset_from_files
function is re-written to work with new dataset to ensure backward compatibility, but a deprecation notice is logged whenever it is used.learning_utils
, not in the TF manager'sexecute
method.dataset.get_series()
method returning a fresh iterator every time is done. Internally, the dataset stores factory functions which are called to create a new data iterator for each series.sources
argument. Originally, the<preprocessor>
would follow the"source"
in the tuple, but this would clash with how the file readers are specified (Tuple[str, Callable]
) so the series-level preprocessor had to become aTuple[Callable, str]