Hotfix: Change the order of train_size in the preprocessing sequence

What

Changing the order from restrict train_size by sampling -> frontal or lateral restriction -> enhancement to frontal or lateral restriction -> enhancement -> restrict train_size by sampling

Why

So far, the training data size restriction has been done in the early stage of data preprocessing. However, the current process returns fewer datasets than a given integer or float (thanks for noticing @seoulsky-field). For example, if you set train_size as 100 & use_frontal as True, the codeset samples 100 data and selects frontal images. Thus, it returns <= 100 images. To avoid this, I checked dataset options that affect the number of datasets and figured out use_frontal & enhancement (upsampling) can reduce or increase the number.

While analyzing the effects of these processing options, I figured out that enhancement is quite complicated and might return a result that far different from what the user expected. Currently, the enhancement accepts multiple target columns and n_times (the amount of upsampling). Since this enhancement works the target column independently (which means that does not consider co-effect), it duplicates more than given n_times due to the inherent trait of multi-label problem.

Here is a really simple example of the enhancing sequence in our codeset. original (3A, 4B) -> Enhancing 'A' 2-times (6A, 6B) -> Enhancing 'B' 2-times (8A, 10B) <- more than 2-times of 'A' and 'B' A B 　　　　　　　　　 A B　　　　　　　　　　　　　　A B 1 0 　　　　　　　　　 1 0　　　　　　　　　　　　　　1 0 1 1 　　　　　　　　　 1 1　　　　　　　　　　　　　　1 1 1 1 　　　　　　　　　 1 1　　　　　　　　　　　　　　1 1 0 1 　　　　　　　　　 0 1　　　　　　　　　　　　　　0 1 0 1 　　　　　　　　　 0 1　　　　　　　　　　　　　　0 1

　　　　　　　　　　　1 0　　　　　　　　　　　　　　 1 0 　　　　　　　　　　　1 1　　　　　　　　　　　　　　 1 1 　　　　　　　　　　　1 1　　　　　　　　　　　　　　 1 1

　　　　　　　　　　　　　　　　　　　　　　　　　　1 1 　　　　　　　　　　　　　　　　　　　　　　　　　　1 1 　　　　　　　　　　　　　　　　　　　　　　　　　　0 1 　　　　　　　　　　　　　　　　　　　　　　　　　　 0 1

It is difficult to determine which way of enhancing (upsampling) is right, but we should definitely recognize this.

How

[x] Code change
[x] Test the length of returning dataset (length of self.df)

seoulsky-field / CXRAIL-dev

Hotfix: Change the order of train_size in the preprocessing sequence #73

What

Why

How