Closed rgilchri closed 1 year ago
Hi @rgilchri! Sorry for the slow response.
I can see a couple things to try. The first is that I can try to reproduce the issue in the develop branch. If you open your project in APT and select File>Import/Export>Advanced>Export Training Data, you can export a MAT-file containing your project contents; I'm guessing this file will not be too large to zip and share here and then I can check things out.
Another option is to try out the latest version of APT in the multianimal branch. This option is a heavier lift, because on Windows you will need to set up a new Docker backend (still in beta, see below). The upside is that this code contains many updates, including a new default network (GRONe), and your issue may already be fixed with these updates.
To try this out, i) change your git branch to multianimal, eg via git checkout multianimal
; ii) set up a Windows Docker backend as described in (beta)-Windows-&-Docker-on-WSL2-Setup-Instructions.
Let us know what you think! Happy to do a quick debug run on develop; then if you plan to use APT for at least a little while, the second option may be worth trying out in any case.
@rgilchri, if the files are too large you can share them via Google Drive or other cloud storage service.
Thank you for the responses @allenleetc and @mkabra! It looks like I'm not able to attach the MAT-file (GitHub says file type not supported) but I've put that file as well as the files from my original comment into a Google Drive folder here: https://drive.google.com/drive/folders/1wz1zmPnOrBWSykFeIG0iTJhoYWzxCyrs?usp=sharing. I would really appreciate the quick debug option on the develop branch if possible before switching to multianimal, so please let me know if you need access to any other files in order to take a look!
Hi @rgilchri, I was able to train a tracker using the files that you have sent on linux, so I think the issue is with the Conda environment or the file system on windows.
Is C:\Users\arren\OneDrive\ local or on the cloud? If it is on the cloud, can you change the cache directory to point to a local directory and train again? To do this, first, copy the Manifest.sample.txt file to Manifest.txt in the APT directory if it doesn't exist. Then change the "dltemproot,/path/to/dl/cachedir" to point to a local directory (Eg "dltemproot,C:\Users\arren\APT_temp"). Make sure to create the directory if it doesn't exist. Once you update the Manifest.txt restart Matlab and train again.
Hey guys
I noticed something else that might be worth trying @rgilchri. Your project name is 'Rachel tries again', which contains spaces and can cause filesystem issues on Windows as @mkabra suggested. You can change this name to something that doesn't contain spaces:
lObj = StartAPT;
% Load your project in the GUI
lObj.projname = 'RachelTriesAgain';
% Save your project and try Training!
I experimented on Windows and was able to reproduce and then fix the problem in this way. Hope this gets you going let us know!
Thank you so much, this immediately fixed the issue! I am so appreciative of your support!
Best, Rachel
On Thu, Oct 13, 2022 at 4:53 PM Allen Lee @.***> wrote:
Hey guys
I noticed something else that might be worth trying @rgilchri https://urldefense.com/v3/__https://github.com/rgilchri__;!!IKRxdwAv5BmarQ!duk1z0vPDkT2ItVAk0ss9x33m6Pc6GelRQLtOYGN6JFsX40tiKdvSn0wJHsSiD9l1ZApSztsv1mwCahN9ewZl74$. Your project name is 'Rachel tries again', which contains spaces and can cause filesystem issues on Windows as @mkabra https://urldefense.com/v3/__https://github.com/mkabra__;!!IKRxdwAv5BmarQ!duk1z0vPDkT2ItVAk0ss9x33m6Pc6GelRQLtOYGN6JFsX40tiKdvSn0wJHsSiD9l1ZApSztsv1mwCahNlf2nbhA$ suggested. You can change this name to something that doesn't contain spaces:
lObj = StartAPT;
% Load your project in the GUI
lObj.projname = 'RachelTriesAgain';
% Save your project and try Training!
I experimented on Windows and was able to reproduce and then fix the problem in this way. Hope this gets you going let us know!
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/kristinbranson/APT/issues/405*issuecomment-1278293887__;Iw!!IKRxdwAv5BmarQ!duk1z0vPDkT2ItVAk0ss9x33m6Pc6GelRQLtOYGN6JFsX40tiKdvSn0wJHsSiD9l1ZApSztsv1mwCahNPpHiDLM$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/A25OEPSH4TD6GQK6QVYASU3WDCOFRANCNFSM6AAAAAAQWKZ4FI__;!!IKRxdwAv5BmarQ!duk1z0vPDkT2ItVAk0ss9x33m6Pc6GelRQLtOYGN6JFsX40tiKdvSn0wJHsSiD9l1ZApSztsv1mwCahN7hH60fg$ . You are receiving this because you were mentioned.Message ID: @.***>
Sweet! Let us know how it goes! Allen
I am using a computer with the following configurations to try and use APT to track the movement of dogs kenneled at an animal shelter.
NVIDIA GeForce RTX 2070 CUDA 11.6.2 Python 3.9.13 Tensorflow 1.15 Matlab R2022a When I open the "Performance" tab in Task Manager and navigate to "GPU" it says
I’m using the “develop” branch of APT. I’m using a Local GPU back end, and tested the backend configuration with no issues (activated APT, found free GPUs). After labeling my frames in my test video, adjusting the tracking parameters to require 4.6 GB of GPU, and selecting MDN as the tracking algorithm, I clicked “Train.” The Training Monitor did appear, but no data points appeared and after a few minutes a popup appeared saying “Training stopped after NaN/60000 iterations. Save trained model to file?" I clicked “save” and went to the menu under the blank Training Monitor plot to see if I could get more information. Here’s what pressing “Go” on each of those options produced:
List all conda jobs: gives “no jobs running, no jobs queued” Show training job's status: this shows “ID 5, started 31-Aug-2022 19:26:03: finished” Show error messages: this shows “no error messages” Show log files: this gives Job 1 :### C:\Users\arren\OneDrive\Documents.apt\tpc3156bc9_6d15_472e_9c69_c605383f4bea\Rachel tries again\20220831T192550view0_20220831T192559_new.log file does not exist
I then clicked “Stop Training” and went to Matlab to see what was produced there, and have entered the Matlab log below:
Although the Matlab log mentions an “err” file and a “log” file, these files do not exist in the generated folder “tpc3156bc9_6d15_472e_9c69_c605383f4bea” (I tried attaching the contents of this folder but it exceeds the 25MB upload limit). What jumps out at me in the Matlab log are the two lines that say “Warning: Failed to update model iteration for model with net type mdn/deeplabcut,” but I’m not sure what that means.
I would appreciate any guidance on this as I’m quite excited to see how APT could be used in an animal shelter setting! Thank you for reading this far, looking forward to troubleshooting together.