rosikand / rsbox

📦 A toolbox of utility functions I commonly use when programming in Python.
MIT License
0 stars 0 forks source link

Add `image_dataset_from_folder` function #2

Closed rosikand closed 2 years ago

rosikand commented 2 years ago

Make a function that takes in a path to a folder structure such as

root_dir/
...dog/
......1.png
......2.png
...cat/
......1.png
......2.png

and returns a classification dataset of the form [(x,y),...(x,y)].

rosikand commented 2 years ago

Base implementation (as used here):

import rsbox
from rsbox import ml_utils 
import numpy as np
from glob import glob
import pickle

def img_dataset_from_dir(dir_path):
    """
    Given a directory containing folders
    representing classes of images, this
    functions builds a valid numpy
    dataset distribution. 
    Input (dir_path) structure: 
    dir_path/class_1, class_n/1.png 
    Note: 'dir_path' must be the raw
    dir name (no trailing dash) 
    Output: [(x,y), ..., (x,y)]
    """

    dir_path = dir_path + "/*/"
    class_list = glob(dir_path, recursive = True)

    master_list = []
    idx = 0
    for class_ in class_list:
        curr_class = ml_utils.image_dir_to_data_norm(class_, "png")
        new_arrays = []
        for elem in curr_class:
            elem = np.moveaxis(elem, -1, 0)
            new_arrays.append(elem)

        labeled_list = ml_utils.gen_label_pair(new_arrays, idx)
        master_list.append(labeled_list)
        idx += 1

    return ml_utils.gen_distro(master_list)
rosikand commented 2 years ago

Added