Making Jack more efficient? feed_dict in TFReader:_train_loop

This is more of a question than a "bug." We have been looking at how we can make Jack more efficient. We have been noticing odd behaviour with 1 GPU and larger batch sizes in terms of throughput. I noticed that the main train loop in TFReader sends its batches to TF using feed_dict. From what I understand (I am far from a TF guru) here, feed_dict is considered the slowest way to get data to the GPU. The tf.data API (datasets) is considered the fastest, but queues are also an option. We have been using an older version of Jack as our basis, and it contains a TFFastReader class in tensorflow.py that looks like it uses queues, but that class seems to have disappeared from the latest Jack (and I could not quickly get it to work).

Are there any plans to look at speeding up the throughput for larger batches or using the newer TF data API? Was there any reason (besides lack of time/resources) to remove TFFastReader?

Thanks!

uclnlp / jack

Making Jack more efficient? feed_dict in TFReader:_train_loop #393