Queue to Dataset API Conversion

donovanlavoie commented 4 years ago

This converts the Tensorflow input pipeline API used by DPP from the Queue API to the Dataset API. We did this in order to try and take advantage of its performance improvements and to eliminate many deprecation warnings from the Queue API.

The conversion partly amounts to rewriting parse_dataset and parse_images and their various helper functions to generate Datasets instead of queues and map transformations onto their elements instead of directly applying them. assemble_graph in each of the problem type classes was also converted, as was the inference forward pass (forward_pass_with_file_inputs).

Along the way, a class-level constant flag was added to the problem type classes, supports_standardization, to allow specific problems like the Countception model to opt out of standardizing their inputs images in a similar fashion to opting out of augmentations.

A bug was also found in the inference forward pass for semantic segmentation, where running on patches and stitching the results back together would crash when attempting to stitch the patches. That portion was rewritten and should work again.

The test suite still passes, all of the problems still run with 0, 1, and 2+ GPUs, and inference has been confirmed to work (again, in some cases) for every problem).

jubbens commented 4 years ago

Are we making sure that we are always cropping to center on the testing images when the cropping augmentation is used?

donovanlavoie commented 4 years ago

You mean in this block of code (in _make_input_dataset)?

if self._augmentation_crop:  # Apply random crops to images
    if train_set:
        self._image_height = int(self._image_height * self._crop_amount)
        self._image_width = int(self._image_width * self._crop_amount)
        input_dataset = input_dataset.map(
            _with_labels(lambda x: tf.random_crop(x, [self._image_height, self._image_width, 3])),
            num_parallel_calls=self._num_threads)
    else:
        input_dataset = input_dataset.map(self._parse_crop_or_pad, num_parallel_calls=self._num_threads)

I know that _parse_crop_or_pad in the testing/validation branch calls tf.image.resize_image_with_crop_or_pad, whose documentation says that it centre crops.

As for cropping by the correct amount, this block currently assumes that the training dataset is made first so that the stored width and height are changed only once. It might change how the resizing earlier happens, though, so I'll probably make some changes to remove that ordering assumption and not permanently apply the crop to the values until every dataset is made.

jubbens commented 4 years ago

@donovanlavoie right you are, I didn't see that block :)

p2irc / deepplantphenomics

Queue to Dataset API Conversion #39