ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models
http://ludwig.ai
Apache License 2.0
11.21k stars 1.19k forks source link

Allow image bytes type during preprocessing #3971

Closed vijayi1 closed 8 months ago

vijayi1 commented 8 months ago

In image_feature.py, image bytes instance type is handled by all the _readimage routines, except for preprocessing. added the same bytes instance logic to the _finalize_preprocessing function. tested the mnist example, with image paths as well as image bytes objects.

github-actions[bot] commented 8 months ago

Unit Test Results

  6 files  ±0    6 suites  ±0   13m 49s :stopwatch: - 3m 14s 12 tests ±0    7 :heavy_check_mark:  -   2    5 :zzz: +  2  0 :x: ±0  60 runs  ±0  30 :heavy_check_mark:  - 12  30 :zzz: +12  0 :x: ±0 

Results for commit c98457c4. ± Comparison against base commit c09d5dc7.

This pull request skips 2 tests. ``` tests.regression_tests.benchmark.test_model_performance ‑ test_performance[ames_housing.gbm.yaml] tests.regression_tests.benchmark.test_model_performance ‑ test_performance[mercedes_benz_greener.gbm.yaml] ```
vijayi1 commented 8 months ago

I used the following on examples/mnist/ and trained with the df -

    import pandas as pd

    df = mnist.load()

    image_bytes = []
    for index, row in df.iterrows():
        img_path = row['image_path']
        f = open(img_path, mode="rb")
        img_bytes = f.read()
        f.close()
        image_bytes.append(img_bytes)

    df_bytes = pd.DataFrame.from_dict({'image_path':image_bytes})

    # replace image file paths with image bytes
    df = df.drop(['image_path'],axis=1)
    df = pd.concat([df, df_bytes], axis=1)

    #print(df.head())
    #df.to_parquet("temp.parquet")
arnavgarg1 commented 8 months ago

Sounds good!