puigalex / deteccion-objetos-video

Deteccion de objetos sobre video usando PyTorch
56 stars 49 forks source link

Problema al ejecutar train.py #35

Open AlbertoPerezGant opened 2 years ago

AlbertoPerezGant commented 2 years ago

Me aparece un error como el que muestro a continuación

---- [Epoch 0/100, Batch 3/6] ----
+------------+--------------+--------------+--------------+
| Metrics    | YOLO Layer 0 | YOLO Layer 1 | YOLO Layer 2 |
+------------+--------------+--------------+--------------+
| grid_size  | 13           | 26           | 52           |
| loss          | 57.082081    | 62.914391    | 71.391045    |
| x           | 0.032295     | 0.078391     | 0.114752     |
| y          | 0.064399     | 0.087460     | 0.083224     |
| w          | 2.110786     | 0.664359     | 1.350570     |
| h          | 0.609881     | 0.739422     | 1.139474     |
| conf       | 53.770958    | 60.624214    | 68.018593    |
| cls        | 0.493764     | 0.720541     | 0.684431     |
| cls_acc    | 100.00%      | 33.33%       | 66.67%       |
| recall50   | 0.000000     | 0.000000     | 0.166667     |
| recall75   | 0.000000     | 0.000000     | 0.166667     |
| precision  | 0.000000     | 0.000000     | 0.000160     |
| conf_obj   | 0.460988     | 0.464489     | 0.505544     |
| conf_noobj | 0.398628     | 0.445496     | 0.487872     |
+------------+--------------+--------------+--------------+
Total loss 191.38751220703125
---- ETA 0:00:04.818292
Traceback (most recent call last):
  File "train.py", line 99, in <module>
    for batch_i, (_, imgs, targets) in enumerate(dataloader):
  File "/home/alpegan/anaconda3/envs/yolotrain/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/alpegan/anaconda3/envs/yolotrain/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 838, in _next_data
    return self._process_data(data)
  File "/home/alpegan/anaconda3/envs/yolotrain/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/home/alpegan/anaconda3/envs/yolotrain/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 4.
Original Traceback (most recent call last):
  File "/home/alpegan/anaconda3/envs/yolotrain/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/alpegan/anaconda3/envs/yolotrain/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/alpegan/anaconda3/envs/yolotrain/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/alpegan/deteccion-objetos-video/utils/datasets.py", line 130, in __getitem__
    img, targets = horisontal_flip(img, targets)
  File "/home/alpegan/deteccion-objetos-video/utils/augmentations.py", line 8, in horisontal_flip
    targets[:, 2] = 1 - targets[:, 2]
TypeError: 'NoneType' object is not subscriptable

Esto me aparece después de llegar al batch por lo que el entrenamiento comienza correctamente y de hecho me genera hasta 3 checkpoints.

Vladinomio commented 2 years ago

RuntimeError: DataLoader worker (pid 7907) is killed by signal: Killed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "train.py", line 99, in for batchi, (, imgs, targets) in enumerate(dataloader): File "/home/vlad/anaconda3/envs/yolotrain/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 517, in next data = self._next_data() File "/home/vlad/anaconda3/envs/yolotrain/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1182, in _next_data idx, data = self._get_data() File "/home/vlad/anaconda3/envs/yolotrain/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1148, in _get_data success, data = self._try_get_data() File "/home/vlad/anaconda3/envs/yolotrain/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 999, in _try_get_data raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e RuntimeError: DataLoader worker (pid(s) 7907) exited unexpectedly

A mi me sale ese error al momento de ejecutar "python train.py --model_def config/yolov3-custom.cfg --data_config config/custom.data --pretrained_weights weights/darknet53.conv.74 --batch_size 1"

AlbertoPerezGant commented 2 years ago

Parece un error en el mismo punto de train.py pero ni idea de como solucionarlo la verdad

JaGuzmanT commented 2 years ago

Warning! HDF5 library version mismatched error The HDF5 header files used to compile this application do not match the version used by the HDF5 library to which this application is linked. Data corruption or segmentation faults may occur if the application continues. This can happen when an application was compiled by one version of HDF5 but linked with a different version of static or shared HDF5 library. You should recompile the application or check your shared library related settings such as 'LD_LIBRARY_PATH'. You can, at your own risk, disable this warning by setting the environment variable 'HDF5_DISABLE_VERSION_CHECK' to a value of '1'. Setting it to 2 or higher will suppress the warning messages totally. Headers are 1.10.4, library is 1.10.5 SUMMARY OF THE HDF5 CONFIGURATION

General Information:

               HDF5 Version: 1.10.5
              Configured on: 2019-03-04
              Configured by: Visual Studio 14 2015 Win64
                Host system: Windows-10.0.17763
          Uname information: Windows
                   Byte sex: little-endian
         Installation point: C:/Program Files/HDF5

Compiling Options:

                 Build Mode:
          Debugging Symbols:
                    Asserts:
                  Profiling:
         Optimization Level:

Linking Options:

                  Libraries:

Statically Linked Executables: OFF LDFLAGS: /machine:x64 H5_LDFLAGS: AM_LDFLAGS: Extra libraries: Archiver: Ranlib:

Languages:

                          C: yes
                 C Compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe 19.0.24218.1
                   CPPFLAGS:
                H5_CPPFLAGS:
                AM_CPPFLAGS:
                     CFLAGS:  /DWIN32 /D_WINDOWS /W3
                  H5_CFLAGS:
                  AM_CFLAGS:
           Shared C Library: YES
           Static C Library: YES

                    Fortran: OFF
           Fortran Compiler:
              Fortran Flags:
           H5 Fortran Flags:
           AM Fortran Flags:
     Shared Fortran Library: YES
     Static Fortran Library: YES

                        C++: ON
               C++ Compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe 19.0.24218.1
                  C++ Flags: /DWIN32 /D_WINDOWS /W3 /GR /EHsc
               H5 C++ Flags:
               AM C++ Flags:
         Shared C++ Library: YES
         Static C++ Library: YES

                        JAVA: OFF
               JAVA Compiler:

Features:

               Parallel HDF5: OFF

Parallel Filtered Dataset Writes: Large Parallel I/O: High-level library: ON Threadsafety: OFF Default API mapping: v110 With deprecated public symbols: ON I/O filters (external): DEFLATE DECODE ENCODE MPE: Direct VFD: dmalloc: Packages w/ extra debug output: API Tracing: OFF Using memory checker: OFF Memory allocation sanity checks: OFF Function Stack Tracing: OFF Strict File Format Checks: OFF Optimization Instrumentation: Bye...

JaGuzmanT commented 2 years ago

me salió eso al poner train.py

gonzalitojh commented 2 years ago

A mí me salía el mismo error y fue por usar imágenes en formato JPEG en lugar de JPG. También me aseguré de que no hubiese ninguna imagen sin su correspondiente archivo en labels.