Open pmussp opened 6 days ago
Hi Peter,
Thanks for attaching the screenshot and the log file. I assume it has been stuck on preprocessing for more than 5-10 minutes. Can you reopen the project, start the training, let it run for an hour, then use "Bundle working directory" from the File menu and send me the tarball (or the zip file)? Looking at the log files, the training is running, but the monitor is probably not getting updated.
Mayank
From: pmussp @.> Sent: Wednesday, June 26, 2024 2:09 AM To: kristinbranson/APT @.> Cc: Subscribed @.***> Subject: [kristinbranson/APT] Training stuck at preprocessing (Issue #414)
External Email: Use Caution
Hi,
I'm trying to train APT using the dlc network, but the training window gets stuck on "Preprocessing" without throwing any errors.
Screenshot.png (view on web)https://urldefense.com/v3/__https://github.com/kristinbranson/APT/assets/28634244/2b56e71b-2f04-4b14-9026-3953c0c2e9af__;!!Eh6p8Q!DtrHJvGh7GgTAKaWx3vZ1x7-ziahiUb4KrJhPRgH_itVIYIHIeUMzpyLGpBt3n29T0BjR6cF8TVI54Pv6Jq4ZSy6kSU$
A docker job is created, but it doesn't seem like the GPU is being used (running nvidia-smi shows that only 200 Mb is being used by MATLAB). However, the docker backend passes the APT test for GPU access.
These are the specs I'm using:
Ubuntu 20.04.6 LTS MATLAB 2023b NVIDIA Corporation TU104 [GeForce RTX 2070 SUPER] Driver Version: 550.54.15 CUDA 11.8 APT main branch Docker backend (latest)
I've attached the log file and the .lbl file. Any help debugging this issue would be greatly appreciated!
Best, Peter
20240625T155325view0_20240625T155325_tdptrx_new.loghttps://urldefense.com/v3/__https://github.com/user-attachments/files/15978376/20240625T155325view0_20240625T155325_tdptrx_new.log__;!!Eh6p8Q!DtrHJvGh7GgTAKaWx3vZ1x7-ziahiUb4KrJhPRgH_itVIYIHIeUMzpyLGpBt3n29T0BjR6cF8TVI54Pv6Jq4y7Bbtgo$ lbl_file.ziphttps://urldefense.com/v3/__https://github.com/user-attachments/files/15978425/lbl_file.zip__;!!Eh6p8Q!DtrHJvGh7GgTAKaWx3vZ1x7-ziahiUb4KrJhPRgH_itVIYIHIeUMzpyLGpBt3n29T0BjR6cF8TVI54Pv6Jq4qCBFxIE$
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/kristinbranson/APT/issues/414__;!!Eh6p8Q!DtrHJvGh7GgTAKaWx3vZ1x7-ziahiUb4KrJhPRgH_itVIYIHIeUMzpyLGpBt3n29T0BjR6cF8TVI54Pv6Jq4l0V2Unw$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAJNKY6PNYC5HI262RWS5ADZJHIPXAVCNFSM6AAAAABJ4SFZT2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3TGNRRHEZDCMQ__;!!Eh6p8Q!DtrHJvGh7GgTAKaWx3vZ1x7-ziahiUb4KrJhPRgH_itVIYIHIeUMzpyLGpBt3n29T0BjR6cF8TVI54Pv6Jq48HFMyc0$. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Ok yeah, it is not training at all. Can you send the movie used in the project (/mnt/data/peter_data/2024_05_24/experiment_01/experiment_01_20240524_084400_behavior_camera_video_undist.avi) and the git version of the APT you use so that I can recreate the issue?
Mayank
From: pmussp @.> Sent: Thursday, June 27, 2024 7:49 PM To: kristinbranson/APT @.> Cc: Kabra, Mayank @.>; Comment @.> Subject: Re: [kristinbranson/APT] Training stuck at preprocessing (Issue #414)
External Email: Use Caution
Hi Mayank,
Here is a zip file of the working directory after running for ~1 hour.
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/kristinbranson/APT/issues/414*issuecomment-2194831686__;Iw!!Eh6p8Q!FFsr_y5mjXh4tWJUplr91eorA0Xr3KFAI_LTEmTW9pZPZz8QvgC5YJgS92S8GWUM43ycDxnwr3lMioRap3nsoG4b-y8$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAJNKY3OAQKYODGVS7JFQYTZJQNQHAVCNFSM6AAAAABJ4SFZT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJUHAZTCNRYGY__;!!Eh6p8Q!FFsr_y5mjXh4tWJUplr91eorA0Xr3KFAI_LTEmTW9pZPZz8QvgC5YJgS92S8GWUM43ycDxnwr3lMioRap3ns9qY03dU$. You are receiving this because you commented.Message ID: @.***>
Hi,
I'm trying to train APT using the dlc network, but the training window gets stuck on "Preprocessing" without throwing any errors.
A docker job is created, but it doesn't seem like the GPU is being used (running nvidia-smi shows that only 200 Mb is being used by MATLAB). However, the docker backend passes the APT test for GPU access.
These are the specs I'm using:
Ubuntu 20.04.6 LTS MATLAB 2023b NVIDIA Corporation TU104 [GeForce RTX 2070 SUPER] Driver Version: 550.54.15 CUDA 11.8 APT main branch Docker backend (latest)
I've attached the log file and the .lbl file. Any help debugging this issue would be greatly appreciated!
Best, Peter
20240625T155325view0_20240625T155325_tdptrx_new.log lbl_file.zip