pythonlessons / mltu

Machine Learning Training Utilities (for TensorFlow and PyTorch)
MIT License
160 stars 100 forks source link

I can't train my private dataset #19

Closed seadhy closed 11 months ago

seadhy commented 12 months ago

I'm trying to use captcha to text but I can't train my dataset like you. When I tried with the dataset you gave, it worked without any problems, but when I changed my own images with yours, I had problems. A few examples from my dataset with 10129 images:

000a2544266070484f9e651067d41b1e-jhlh 000db107392d7af6a5a5239286724ea1-hlfg 0a9474ddd6ca48343277b8dd9ba4aaea-rulr

I made a change in train.py file like this: label = os.path.splitext(file)[0] -> label = os.path.splitext(file)[0].split('-')[1].

Because the names of my images are not captcha_answer.png like yours, but md5hash-captcha_answer.png. So I made a change in this way and made it take the captcha_answer parameter in the same way.

In the config.py file, since all my images are 350x100, I changed self.height = 100 and self.width = 350. Then I got the following error. Can you help me solve this?

image

pythonlessons commented 12 months ago

It would help if you modified the model according to your input shapes. I believe if you will set self.height = 50 and self.width = 200 as it was initially, everything will be ok. Check this out

seadhy commented 12 months ago

I changed it but I keep getting the same error image

Can you try to use this project again with my images?

pythonlessons commented 11 months ago

I think you are inputing None to the model, check this first (if you really receive images from dataprovider)

seadhy commented 11 months ago

I'm sure about this. Please watch the video: https://streamable.com/3f16yn

pythonlessons commented 11 months ago

If you can upload somewhere your dataset and code you showed here, I'll try it by myself

seadhy commented 11 months ago

You can download 100 images of my dataset and the train.py file I changed from this link. https://www.mediafire.com/file/4b8l1qdmeya8bp1/dataset_and_train_file.rar/file

pythonlessons commented 11 months ago

ok mate, I tested it out and it seems that cv2.imread(file_path) returns None. So it means that your images are in some kind format so opencv can't handle it up. So either fix your images or find a way to read them using different method. But I prefer fixing them, try to read them in pillow and save them with cv2, this should solve this

seadhy commented 11 months ago

Thanks! I solved the problem using this code.

import os
import cv2
import numpy as np
from PIL import Image

for img_path in os.listdir('Datasets/captcha_images_v2'):
    old_path = 'Datasets/captcha_images_v2/' + img_path

    img_pil = Image.open(old_path)
    img_np = np.array(img_pil)
    img_cv2 = cv2.cvtColor(img_np, cv2.COLOR_RGB2BGR)

    new_path = old_path.replace('captcha_images_v2', 'my_dataset')

    cv2.imwrite(new_path, img_cv2)

But before closing the issue, I want to ask you a few questions. How many epochs do you think I should use for this project? I noticed that it works when I set self.height=100, self.width=350 again. Do you think I should train this way or 50, 200? (all my images are in 100x350 format). Other than that, if you have any suggestions for this project, I'd love to hear it. Thanks again!

pythonlessons commented 11 months ago

train for unlimited epochs, for example for 1000 epochs, and EarlyStopping at some point will work. Then train another model with a different input size until it stops. And choose whether the model gives you better accuracy in terms of CER, 1000 images are a pretty small dataset, you may increase accuracy by adding more images. Overall, images are pretty simple, so it should take for long to train both of these models

seadhy commented 11 months ago

Should I leave the self.height and self.width values ​​at their default values ​​of 50, 200 or change them to 100, 350 the same as my images?

It saves onnx and csv files only after all epochs are finished. But he was recording every time on video. What is the reason of this?

pythonlessons commented 11 months ago

Should I leave the self.height and self.width values ​​at their default values ​​of 50, 200 or change them to 100, 350 the same as my images?

It saves onnx and csv files only after all epochs are finished. But he was recording every time on video. What is the reason of this?

Usually, smaller input size, means faster inference model, higher input size slower inference but better accuracy. So that's why you should train both models and see whether bigger input size does impact to the accuracy.

It saves onnx and csv files only after all epochs are finished. But he was recording every time on video. What is the reason of this?

Can you explain this in more details? What you mean? it saves .h5 best model every epoch, and after training finishes it loads these weights and converts model into onnx

seadhy commented 11 months ago

I seem to have solved the problem but now I have a different problem. I trained the model via Google Colab. I started it with 350x100, 300 epochs and 256 batch_size. It was stopped by Earlystopper at epoch 50. When I tried the model, even with the images it trained, I was getting an empty predict result. When I looked at the file you trained and my own log file, my loss values were almost never decreasing. However, in yours, it decreased and decreased even less than 1. I have shared my results below, can you check and give suggestions?

https://www.mediafire.com/file/o7lj3s3wbz9aolv/202307121335.rar/file

seadhy commented 11 months ago

I seem to have solved the problem but now I have a different problem. I trained the model via Google Colab. I started it with 350x100, 300 epochs and 256 batch_size. It was stopped by Earlystopper at epoch 50. When I tried the model, even with the images it trained, I was getting an empty predict result. When I looked at the file you trained and my own log file, my loss values were almost never decreasing. However, in yours, it decreased and decreased even less than 1. I have shared my results below, can you check and give suggestions?

https://www.mediafire.com/file/o7lj3s3wbz9aolv/202307121335.rar/file

@pythonlessons Please check. By the way, 2nd tutorial gives an error when I try to use it with the version you updated.

pythonlessons commented 11 months ago

thanks, I'll make fix release first

pythonlessons commented 11 months ago

Solved.