rwth-i6 / returnn

The RWTH extensible training framework for universal recurrent neural networks
http://returnn.readthedocs.io/
Other
349 stars 130 forks source link

Datasets should use TensorDict #1302

Open albertz opened 1 year ago

albertz commented 1 year ago

Now that we have the generic Tensor and TensorDict, we can remove the old ambiguous and limited num_outputs and num_inputs from the dataset and replace it by extern_data using TensorDict and Tensor to describe the data streams.

albertz commented 3 months ago

@NeoLegends One first step would be to remove all code usages of num_outputs/num_inputs and replace by is_data_sparse, get_data_shape, get_data_dim, etc.

Also, while at that, we should fix datasets which require to load data for those functions. That should not be necessary for those functions.

Same for get_data_keys.

A next step would be to maybe introduce a new function get_data_tensor_template(self, key: str) -> Tensor or so.