[Mask R-CNN] - Clarify whether partitioning image dataset into landscape and portrait images is allowed

According to the rules regarding training data order, where data pipelines randomly order data,

arbitrary sharding, batching, and packing are allowed provided that (1) the data is still overall randomly ordered and not ordered to improve convergence and (2) each datum still appears exactly once.

We would like to get clarification on whether this allows sharding the image dataset for Mask R-CNN into landscape and portrait images. We believe that something similar is already allowed for audio-sequences with the RNN-T speech recognition model via bucketing. Partitioning images into landscape and portrait images seems like a natural extension of bucketing for audio-sequences to image datasets.

Additionally, within the rules related to pre-training, it is clearly stated that:

High-level statistical information about the dataset, such as distribution of sizes, may be used.

So, we believe that such a portrait/landscape partitioning of an image dataset is consistent with current rules/practices.

mlcommons / training_policies

[Mask R-CNN] - Clarify whether partitioning image dataset into landscape and portrait images is allowed #459