johnolafenwa / DeepStack

The World's Leading Cross Platform AI Engine for Edge Devices
Apache License 2.0
714 stars 113 forks source link

Training locally #86

Open kpally4 opened 3 years ago

kpally4 commented 3 years ago

Hello, i'm interested to create a custom model locally. I tried to do it online but i get kicked out after 12 hours I followed every step in the guide, only need the last correct string to see if works. -CUDA + CUDNN + PyTorch installed and verified with python followed by import torch torch.cuda.is_available() -Cloned DeepStack Trainer with git clone https://github.com/johnolafenwa/deepstack-trainer

Looks all fine and ready but i'm lack of knowledge and this python3 train.py --dataset-path "/path-to/my-dataset don't works Maybe i need to change the path.. i tried but not found the correct one.

Also, how can i use different type of --model --batch-size epochs ? An example of a custom string help me a lot, my folder "my-dataset" with "train" and "test" (edited with LabelIMG in YOLO) is ready to be used in my desktop :) Thanks

johnolafenwa commented 3 years ago

Hello @kpally4 , the --dataset-path should point to the directory where your train and test folders are located. For the --model and other parameters, see this guide https://colab.research.google.com/drive/1gbTr_4xpDk3cpnbAVbMVxtyp-3XuUPix?usp=sharing

kpally4 commented 3 years ago

Thanks for the reply @johnolafenwa To be honest i'm really confused, never done it before... i don't know why nothing happen when i give the command line As said, CUDA CUDNN PyTorch looks like functional https://i.imgur.com/AykRDnQ.png I'm not sure about the last requirement in the guide pip install -r requirements.txt i mean, i selected in order stable(1.7.1), Windows, Pip, Python, 10.1 from the pytorch site and copy paste the line https://i.imgur.com/2CilGut.png Then tried the step of clone trainer Then tried to launch the trainer with python3 train.py --dataset-path "D:\My-Dataset" --model "yolo5s" --batch-size 32 with the path of where is located the folder called "My-Dataset" with test and train inside in "D" but nothing happened, i tried in multiple prompt/powershell ... Maybe i missing something Help when you can please :)

kpally4 commented 3 years ago

Any help from anyone who already trained his "train" and "test" folders locally? I'm stucked at the same point as described above :(

MissMusic commented 3 years ago

The path from your WSL (linux) environment is not the same.

This is how i run it:

Open a command prompt. bash <- Start bash to jump into your linux environment. cd /mnt/c/temp/deepstack/deepstack-trainer <- change folder to my Windows C:\temp\deepstack\deepstack-trainer python3 train.py --dataset-path "/mnt/c/temp/deepstack/data" <- My test and train folders are placed in C:\temp\deepstack\data*

I hope this helps.

kpally4 commented 3 years ago

The path from your WSL (linux) environment is not the same.

This is how i run it:

Open a command prompt. bash <- Start bash to jump into your linux environment. cd /mnt/c/temp/deepstack/deepstack-trainer <- change folder to my Windows C:\temp\deepstack\deepstack-trainer python3 train.py --dataset-path "/mnt/c/temp/deepstack/data" **<- My test and train folders are placed in C:\temp\deepstack\data***

I hope this helps.

Thanks for the reply, i finally understand the point where a linux enviroment was required... well.. anyway I got it, and changed my paths in my own way. cd /mnt/c/windows/deepstack/deepstack-trainer Followed by python3 train.py --dataset-path "/mnt/c/windows/deepstack/data

The result is Traceback (most recent call last): File "train.py", line 11, in <module> import numpy as np ModuleNotFoundError: No module named 'numpy'

And i don't know what that mean

MissMusic commented 3 years ago

You are missing dependencies, i did too.

I fixed it by running the setup code included in the google colab in my bash environment:

!git clone https://github.com/johnolafenwa/deepstack-trainer %cd deepstack-trainer !pip install -r requirements.txt !pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

kpally4 commented 3 years ago

You are missing dependencies, i did too.

I fixed it by running the setup code included in the google colab in my bash environment:

!git clone https://github.com/johnolafenwa/deepstack-trainer %cd deepstack-trainer !pip install -r requirements.txt !pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

I tried with the same method but I only succeeded by installing each item from time to time. Like this: python3 -m pip install numpy python3 -m pip install torch python3 -m pip install tensorboard python3 -m pip install tqdm pip install image sudo apt install python3-opencv -y pip3 install torchvision pip3 install matplotlib pip3 install scipy

At this point this is what happened as result of python3 train.py --dataset-path "/mnt/c/windows/deepstack/data" mFQc23f

kpally4 commented 3 years ago

I changed 2 lines in "yolo.py" from b[:, 4] += math.log(8 / (640 / s) 2) # obj (8 objects per 640 image) b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls to this b.data[:, 4] += math.log(8 / (640 / s) 2) # obj (8 objects per 640 image) b.data[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls

At this point looks like it start to works BUT it crash into a: Segmentation fault (core dump)

and stopped Also it use CPU instead of GPU I don't know how to proceed

cydj5tG