eggbread commented 4 years ago

Hello, First of all, This is so nice code for detection, tracking. I can run this code on my PC. I have a question. I want to run this code on the Jetson Nano. I could run save_model.py, But object_tracker.py showed Aborted (core dumped). Is this not for arm architecture? Give me some advice. Thank you

robisen1 commented 4 years ago

I have run it on Jetson TX2 it works fine although very slow. The nano though has a lot less horsepower so i bet you can run it but it will be very very slow. You should consider using tflite or tiny yolo. Also, I think I might be able to help you. Can you provide the output of your error message and/or dmesg? Maybe using paste would be good so that there is not a huge amount of text.

derm1ch1 commented 3 years ago

I have run it on Jetson TX2 it works fine although very slow. The nano though has a lot less horsepower so i bet you can run it but it will be very very slow. You should consider using tflite or tiny yolo. Also, I think I might be able to help you. Can you provide the output of your error message and/or dmesg? Maybe using paste would be good so that there is not a huge amount of text.

Hey robinsen1, I try to run this on a Jetson Nano as well but run into a couple of issues - would you be so kind to help me out in case you know something or have an idea? :-)

As a first try, I just want to run the object_tracker.py with my webcam attatched. When I run the command in my terminal it takes a very very long time (which is fine for me, it is what it is) - but: Finally crashes with the following error: "too many resources requested for launch" Sometimes it crashes with "OOM when allocating tensor with shape ..."

In summary: I'm always running out of memory - one or the other way. Set up a swapfile with 10GB on the Nano - improved, but didn't solve the problem. In fact, the Nano isn't even using the swapfile much - it just allocates 100% of the "physical" memory and then the programm crashes. While I'm left with 9,8GB on the swapfile.

I really need to get this running.

Please tell me you have an idea! Would make my week!! :-) Greetings from germany!

robisen1 commented 3 years ago

OOM would mean that the GPU is being overwhelmed. The easiest way to deal with this is usually reduce batch size. Another way is to reduce size of dataset like if you have 3 meg files use a tool to reduce the quality of the images while preserving their size or use a tool to change scale, change width and height, that also changes annotations to match. That being said are you trying to train on the NANO? If so you should not. It does not have the resources to really train anything. Even a Xavier is not enough to really train good models. Its best to use a desktop or server. My desktop as a rtx 2080 tf 11g card. I train my weights using that then move them to my nano or jetson

From: derm1ch1 notifications@github.com Sent: Wednesday, September 16, 2020 2:54 PM To: theAIGuysCode/yolov4-deepsort yolov4-deepsort@noreply.github.com Cc: robisen1 robisen@gmail.com; Comment comment@noreply.github.com Subject: Re: [theAIGuysCode/yolov4-deepsort] Run on the Jetson Nano (#5)

I have run it on Jetson TX2 it works fine although very slow. The nano though has a lot less horsepower so i bet you can run it but it will be very very slow. You should consider using tflite or tiny yolo. Also, I think I might be able to help you. Can you provide the output of your error message and/or dmesg? Maybe using paste would be good so that there is not a huge amount of text.

Hey robinsen1, I try to run this on a Jetson Nano as well but run into a couple of issues - would you be so kind to help me out in case you know something or have an idea? :-)

As a first try, I just want to run the object_tracker.py with my webcam attatched. When I run the command in my terminal it takes a very very long time (which is fine for me, it is what it is) - but: Finally crashes with the following error: "too many resources requested for launch" Sometimes it crashes with "OOM when allocating tensor with shape ..."

In summary: I'm always running out of memory - one or the other way. Set up a swapfile with 10GB on the Nano - improved, but didn't solve the problem. In fact, the Nano isn't even using the swapfile much - it just allocates 100% of the "physical" memory and then the programm crashes. While I'm left with 9,8GB on the swapfile.

I really need to get this running.

Please tell me you have an idea! Would make my week!! :-) Greetings from germany!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/theAIGuysCode/yolov4-deepsort/issues/5#issuecomment-693685728 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSF2ROLB7OSQFATYKSS6CLSGEXZFANCNFSM4QJD3UVQ . https://github.com/notifications/beacon/ABSF2RKCWNAU6PDZLSIZCUDSGEXZFA5CNFSM4QJD3UV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOFFMM3YA.gif

derm1ch1 commented 3 years ago

@robisen1 thanks so much for this super quick answer!! No, I'm not trying to train on the Nano - I have a Nvidia 1080ti in my Desktop PC where I successfully trained a custom data set and ran it without problems and very smooth.

If I'm correct batch etc. size would only have an effect with regard to training, right? Not with regard to the filesize of the trained model or something ..

So my setup is as follows:

Trained a model and ran object_detector.py on PC
Setup Nano to run the object_detector.py with already trained model
-- never came to step 3, OOM stopped me from having a good life --

It's only about running the deepsort script on the Nano ... but it doesn't launch bc. of the memory issue.

Best regards :-)

robisen1 commented 3 years ago

are you using tiny yolo? I am assuming your also using a web camera or are you using a mipi? I need to know more to trouble shoot but one thing that could happen is that image sizes are too big. Could you drop your resolution to 640 or 320?

Have you thought of running the script using -mpdb ? that will give more clues I think. Ohh its probably important to now what version of jetpack you are using. its very possible your model and tensorflow take up too much memory. Nvidia of course recommend moving to TensorRT which will help but I don’t think its what your looking for. If that is indeed the issue there are some other ways to deal with that

From: derm1ch1 notifications@github.com Sent: Wednesday, September 16, 2020 4:16 PM To: theAIGuysCode/yolov4-deepsort yolov4-deepsort@noreply.github.com Cc: robisen1 robisen@gmail.com; Mention mention@noreply.github.com Subject: Re: [theAIGuysCode/yolov4-deepsort] Run on the Jetson Nano (#5)

@robisen1 https://github.com/robisen1 thanks so much for this super quick answer!! No, I'm not trying to train on the Nano - I have a Nvidia 1080ti in my Desktop PC where I successfully trained a custom data set and ran it without problems and very smooth.

If I'm correct batch etc. size would only have an effect with regard to training, right? Not with regard to the filesize of the trained model or something ..

So my setup is as follows:

Trained a model and ran object_detector.py on PC
Setup Nano to do run the object_detector.py with already trained model
-- never came so far, OOM stopped me from having a good life --

It's only about running the deepsort script on the Nano ... but it doesn't launch bc. of the memory issue.

Best regards :-)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theAIGuysCode/yolov4-deepsort/issues/5#issuecomment-693715734 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSF2RKJS76Q6A7ANPTLCW3SGFBK3ANCNFSM4QJD3UVQ . https://github.com/notifications/beacon/ABSF2ROEEVTP6YRDZPCHOGLSGFBK3A5CNFSM4QJD3UV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOFFMUGFQ.gif

derm1ch1 commented 3 years ago

@robisen1 you are so right, should have list this earlier:

Hardware

I'm using an usb camera, but final goal is streaming a web-camera (accessing via url)
I would be able to use a mipi-camera though if that would help

Software

I used the --size 320 flag to drop the resolution -> didn't work unfortunately, but is this the correct way since my model is still trained on 416?
version of JetPack is 4.4
I'm using a yolov4-416 model, provided by @theAIGuysCode in this repro
- I will try to convert and use a tiny-yolo model as a next step
- but performance wise: I don't need speed (fps, ...) - I want to grab a single frame every 2min with high accuracy
- therefore I decided to use a "full" yolo model instead of the tiny-version --> am I thinking about this the correct way?
Didn't think of running the script with -mpdb, good point! Currently trying it - the Nano just crashed completely while running the script - I will update this bulletpoint when I get information from the pdb console

Memory

The script already uses "_tf.config.experimental.set_memorygrowth" - which was something I considered helpful in the past (tensorflow.org)
I noticed that about 1.3GiB of the main memory is already in use when I restart the Nano, so I thought about running the script on a lighter operating system (only consumes about 0.4GiB) --> could that have an impact in your opinion? (based on this article)
- I additionally monitored the script on my PC where it's running fine: The script never used more than 3.x GiB of memory --> so bc. of this I was thinking that the memory issue is just about some megabytes since the nano has 3.9 GiB in total + Swap

Important note

I trained a custom yolov3 model
ran it on the jetson nano in a keras framework (with this repo)
had as well some memory issues but overcame them by using the "set_memory_growth = True" Tensorflow config --> All I need is using Deepsort, so this repo is perfect - but it's maybe loading the model in another way or something else, which uses too much memory

Are there any other information I can provide? And do you have other suggestions to work with this memory issue? Thanks in advance and have a great day!

robisen1 commented 3 years ago

Answers inline

Hardware

• I'm using an usb camera, but final goal is streaming a web-camera (accessing via url)

• I would be able to use a mipi-camera though if that would help

I asked about the cameras incase it was a opencv or gstreamer issue. I have found mipi hard to deal with.

Software

• I used the --size 320 flag to drop the resolution -> didn't work unfortunately, but is this the correct way since my model is still trained on 416?

With yolo all image data is resized to the setting in your config so 416X416. Where the image does not exist when resizing it is padded with zeros. You can literally use anysize your system can handle. I have seen people run the normal model when they use darknet which uses far less resources that tensorflow. If you want to use the full model then make sure to use the setting mentioned here https://forums.developer.nvidia.com/t/oom-yolo-v4-predict-not-train/141906. It may solve your issue.

BTW what does free -m or free -t -m or use mem or meminfo whatever you like.

• version of JetPack is 4.4

• I'm using a yolov4-416 model, provided by @theAIGuysCode in this repro

o I will try to convert and use a tiny-yolo model as a next step

o but performance wise: I don't need speed (fps, ...) - I want to grab a single frame every 2min with high accuracy

o therefore I decided to use a "full" yolo model instead of the tiny-version --> am I thinking about this the correct way?

As for performance it is still better to use tiny since its for resource constrained environments that being said we can shoot for yolov4 but you should test with tiny first. If it works its most likely a mem issue.

The site as the simple two step for tiny yolo. I think your referencing this?

save yolov4-tiny model

python save_model.py --weights ./data/yolov4-tiny.weights --output ./checkpoints/yolov4-tiny-416 --model yolov4 --tiny

Run yolov4-tiny object tracker

python object_tracker.py --weights ./checkpoints/yolov4-tiny-416 --model yolov4 --video ./data/video/test.mp4 --output ./outputs/tiny.avi --tiny

Memory

• The script already uses "tf.config.experimental.set_memory_growth" - which was something I considered helpful in the past (tensorflow.org)

• I noticed that about 1.3GiB of the main memory is already in use when I restart the Nano, so I thought about running the script on a lighter operating system (only consumes about 0.4GiB) --> could that have an impact in your opinion? (based on this article)

o I additionally monitored the script on my PC where it's running fine: The script never used more than 3.x GiB of memory --> so bc. of this I was thinking that the memory issue is just about some megabytes since the nano has 3.9 GiB in total + Swap

On your PC are you talking about GPU? Or CPU? Or CPU and GPU >

Are there any other information I can provide? And do you have other suggestions to work with this memory issue?

Thanks in advance and have a great day!

—

You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub, or unsubscribe.

From: derm1ch1 notifications@github.com Sent: Thursday, September 17, 2020 2:56 AM To: theAIGuysCode/yolov4-deepsort yolov4-deepsort@noreply.github.com Cc: robisen1 robisen@gmail.com; Mention mention@noreply.github.com Subject: Re: [theAIGuysCode/yolov4-deepsort] Run on the Jetson Nano (#5)

@robisen1 https://github.com/robisen1 you are so right, should have list this earlier:

Hardware

I'm using an usb camera, but final goal is streaming a web-camera (accessing via url)
I would be able to use a mipi-camera though if that would help

Software

I used the --size 320 flag to drop the resolution -> didn't work unfortunately, but is this the correct way since my model is still trained on 416?
version of JetPack is 4.4
I'm using a yolov4-416 model, provided by @theAIGuysCode https://github.com/theAIGuysCode in this repro
I will try to convert and use a tiny-yolo model as a next step
but performance wise: I don't need speed (fps, ...) - I want to grab a single frame every 2min with high accuracy
therefore I decided to use a "full" yolo model instead of the tiny-version --> am I thinking about this the correct way?

Memory

The script already uses "tf.config.experimental.set_memory_growth" - which was something I considered helpful in the past (tensorflow.org https://www.tensorflow.org/api_docs/python/tf/config/experimental/set_memory_growth )
I noticed that about 1.3GiB of the main memory is already in use when I restart the Nano, so I thought about running the script on a lighter operating system (only consumes about 0.4GiB) --> could that have an impact in your opinion? (based on this article https://www.zaferarican.com/post/how-to-save-1gb-memory-on-jetson-nano-by-installing-lubuntu-desktop )
I additionally monitored the script on my PC where it's running fine: The script never used more than 3.x GiB of memory --> so bc. of this I was thinking that the memory issue is just about some megabytes since the nano has 3.9 GiB in total + Swap

Are there any other information I can provide? And do you have other suggestions to work with this memory issue? Thanks in advance and have a great day!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theAIGuysCode/yolov4-deepsort/issues/5#issuecomment-694127393 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSF2RP7OL4A7TSQT2WSQPDSGHMKDANCNFSM4QJD3UVQ . https://github.com/notifications/beacon/ABSF2RN32N7PNG4IUFDJNP3SGHMKDA5CNFSM4QJD3UV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOFFPYWII.gif

robisen1 commented 3 years ago

ohh jeezz my message got mangled. hopefully it still makes sense. are you still having issues

derm1ch1 commented 3 years ago

@robisen1 apologies for the late answer!! - had some serious trouble with other fields regarding my project! Thanks for being so patient with my issue!! I could absolutely read all your answers!

With yolo all image data is resized to the setting in your config so 416X416. Where the image does not exist when resizing it is padded with zeros. You can literally use anysize your system can handle. I have seen people run the normal model when they use darknet which uses far less resources that tensorflow. If you want to use the full model then make sure to use the setting mentioned here https://forums.developer.nvidia.com/t/oom-yolo-v4-predict-not-train/141906. It may solve your issue.

Tried the config (didn't set it before) and it kind of seems to work a bit better - at least it prevents the script from crashing! It gets to a point where it prompts a couple of warnings on the screen: warning: GStreamer: pipeline have not been created But it stops at this point. Memory usage at 3.6 GiB (92.8%) and CPU of one core at constant 100% It seems doing something - but nothing is happening (let the script run for a whole night) - I think the script hang up at one point, bc. ctrl + c didn't work to break it. I needed to close the terminal to stop the process.

As for performance it is still better to use tiny since its for resource constrained environments that being said we can shoot for yolov4 but you should test with tiny first. If it works its most likely a mem issue.

You're right, tried it with tiny-yolo - but it had no effect. Script is still not working. For eliminating personal code-issues I used the object_tracker.py with the pretrained tiny-yolo-416 model from this repo. But no win on this front unfortunately.

On your PC are you talking about GPU? Or CPU? Or CPU and GPU

I need to rephrase, since I learned some stuff about GPU architecture and memory: The systemmonitor is showing a max. memory-usage of 3.6 GiB When I look at the nvidia-smi monitor it shows usage of 5.9 GiB (GPU has total memory of about 6.0x GiB --> Nvidia gtx 850 ti) But I learned traditionally Tensorflow reserves all available memory - so this shouldn't be the actually used amount of memory ... I hope/guess/pray ..? :D

BTW what does free -m or free -t -m or use mem or meminfo whatever you like.

Tried this - but had no effect? Maybe I'm doing it wrong, googled it - but "_python -m objecttracker.py ..." doesn't print any additional information. I guess I'm using wrong syntax?

My new solution attempt: As for now I have no idea how to solve this issue - therefore I was looking for alternative hardware and ordered a Jetson Xavier NX Do you have any recommendation in regard to other software or hardware solutions? I saw in your original post that you work with a Jetson TX2 - is there a specific advantage in using the TX2 instead of the Xavier? I can't really estimate what hardware would be best for my usecase. Xavier nx comes with 8GiB memory so I assumed this is slightly better than Jetson Nano. What do you think/recommend? I need a fast approach for a working solution, so I'm really thankful for your advices and ideas!

Have a great day!

robisen1 commented 3 years ago

Tried the config (didn't set it before) and it kind of seems to work a bit better - at least it prevents the script from crashing! It gets to a point where it prompts a couple of warnings on the screen: warning: GStreamer: pipeline have not been created But it stops at this point. Memory usage at 3.6 GiB (92.8%) and CPU of one core at constant 100% It seems doing something - but nothing is happening (let the script run for a whole night) - I think the script hang up at one point, bc. ctrl + c didn't work to break it. I needed to close the terminal to stop the process.

Can you try running it with -mpdb ? when it starts up use the “cont” so it will run until there is some sort of error. If it stops, like to you mention, you should then be able to inspect variables and the like. Check out https://realpython.com/python-debugging-pdb/ https://realpython.com/python-debugging-pdb/ . This may not be helpful but it is useful to discount various issues. Logs will be useful at this point and dmsg. Could you share?

Also please try allow_growth which essentially assigns very little mem to tensorflow and then scales up as needed. Example:

config = tf.ConfigProto()

config.gpu_options.allow_growth = True

session = tf.Session(config=config, ...)

Another thing you might try is tf.config.experimental.set_memory_growth. Info at https://www.tensorflow.org/api_docs/python/tf/config/experimental/set_memory_growth:

tf.config.experimental.set_memory_growth(

device,

enable

)

You're right, tried it with tiny-yolo - but it had no effect. Script is still not working. For eliminating personal code-issues I used the object_tracker.py with the pretrained tiny-yolo-416 model from this repo. But no win on this front unfortunately.

This is very strange. Try the above options to constrain memory and see what happens but I think maybe you have more issues than just mem. The code you are using its just a straight clone of the repo right? Also getting logs is very important. There is a small chance gstreamer is causing issues but it’s a very small chance.

I need to rephrase, since I learned some stuff about GPU architecture and memory: The systemmonitor is showing a max. memory-usage of 3.6 GiB When I look at the nvidia-smi monitor it shows usage of 5.9 GiB (GPU has total memory of about 6.0x GiB --> Nvidia gtx 850 ti)

Ok that’s memory on your computer.

But I learned traditionally Tensorflow reserves all available memory - so this shouldn't be the actually used amount of memory ... I hope/guess/pray ..? 😃

See above comments on controlling this behavior. You should try one of those. I recommend the first one; first 😊

BTW what does free -m or free -t -m or use mem or meminfo whatever you like.

Tried this - but had no effect? Maybe I'm doing it wrong, googled it - but "python -m object_tracker.py ..." doesn't print any additional information. I guess I'm using wrong syntax?

Just to make things easy just use top or htop, if you have not used it check it out it is really cool.

My new solution attempt: As for now I have no idea how to solve this issue - therefore I was looking for alternative hardware and ordered a Jetson Xavier NX https://www.nvidia.com/de-de/autonomous-machines/embedded-systems/jetson-xavier-nx/ Do you have any recommendation in regard to other software or hardware solutions?

The Jetson TX2 should be able to run all of this. I have run it on the TX2 as well as another one that is a real memory pig and uses up a lot of file space. That being said the Xavier is 10 faster than TX2!! It should be able to, if you use tiny, get to 20+ plus frames depending on what you do. On the tx2 I get about 9 FPS. The 10 speed increase is not linear so I cannot really say how much faster but it should be a nice speed increase. That being said I am not sure your issue is hardware. Maybe try the above steps and see?

Also.. if your really stuck I could probably do a install and write down every step. In fact.. I am sort of going to do that soon anyways. Please try the suggestions above first though.

I saw in your original post that you work with a Jetson TX2 - is there a specific advantage in using the TX2 instead of the Xavier? I can't really estimate what hardware would be best for my usecase. Xavier nx comes with 8GiB memory so I assumed this is slightly better than Jetson Nano. What do you think/recommend? I need a fast approach for a working solution, so I'm really thankful for your advices and ideas!

Have a great day!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theAIGuysCode/yolov4-deepsort/issues/5#issuecomment-695811331 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSF2RK2GQ3QRLGNTDXL3GLSGYZPRANCNFSM4QJD3UVQ . https://github.com/notifications/beacon/ABSF2RKQXNYAFHHQLSWRC5LSGYZPRA5CNFSM4QJD3UV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOFF4T2AY.gif

derm1ch1 commented 3 years ago

Dear @robisen1,

Answer 1

Can you try running it with -mpdb ? when it starts up use the “cont” so it will run until there is some sort of error. If it stops, like to you mention, you should then be able to inspect variables and the like. Check out https://realpython.com/python-debugging-pdb/ https://realpython.com/python-debugging-pdb/ . This may not be helpful but it is useful to discount various issues. Logs will be useful at this point and dmsg. Could you share?

Alright, got it! Main error seems to be:

2020-09-20 23:22:46.363942: F tensorflow/core/kernels/resize_bilinear_op_gpu.cu.cc:493] Non-OK-status: GpuLaunchKernel(kernel, config.block_count, config.thread_per_block, 0, d.stream(), config.virtual_thread_count, images.data(), height_scale, width_scale, batch, in_height, in_width, channels, out_height, out_width, output.data()) status: Internal: too many resources requested for launch
Fatal Python error: Aborted

Nevertheless I posted the whole error message here. Edit: I just tried htop and love it! I will comment if it gives me further information/findings :)

Answer 2

Also please try allow_growth which essentially assigns very little mem to tensorflow and then scales up as needed. Example:

Thanks for the provided code-example! Unfortunately I already tried it without success + in addition with config.gpu_options.per_process_gpu_memory_fraction = 0.333 Both leads to better performance in terms of "the script fails at a later point" - but doesn't solve it as far as I'm concerned :/

But just to be clear: Is there a difference between config.gpu_options.allow_growth = True And tf.config.experimental.set_memory_growth() --> I thought first is for tf 1.x and second is for tf 2.x - is it possible I did something wrong here? I experience a clear difference with and without using this config though.

Answer 3

This is very strange. Try the above options to constrain memory and see what happens but I think maybe you have more issues than just mem. The code you are using its just a straight clone of the repo right? Also getting logs is very important. There is a small chance gstreamer is causing issues but it’s a very small chance.

Yes, it's a complete clone - but it may be an approach to clone the repo again and try a fresh attempt? Maybe something broke while I was trying to fix the the memory-issue. I don't think so, but for problem delimitation I will do so and give feedback.

Answer 4

The Jetson TX2 should be able to run all of this. I have run it on the TX2 as well as another one that is a real memory pig and uses up a lot of file space. That being said the Xavier is 10 faster than TX2!! It should be able to, if you use tiny, get to 20+ plus frames depending on what you do. On the tx2 I get about 9 FPS. The 10 speed increase is not linear so I cannot really say how much faster but it should be a nice speed increase. That being said I am not sure your issue is hardware. Maybe try the above steps and see?

Can't stress enough how optimistic this makes me! :D With this in mind, I would nevertheless love to get this running on a Jetson Nano! As long as you provide me with this great ideas and thoughts! I'm very sure that there is a memory issue (just by considering the error messages I got over the last week) - but maybe that's not a dealbreaker or the striking point? Looking forward to the Xavier NX now - but would still try everything for the Jetson Nano!

Answer 5

Also.. if your really stuck I could probably do a install and write down every step. In fact.. I am sort of going to do that soon anyways. Please try the suggestions above first though.

You're such a blessing! I would really love to see this running on a Jetson Nano and am very very thankful for all your comments!

robisen1 commented 3 years ago

I read the whole message. Eager execution is probably eating up all the memory which sucks. A lot of people have issues with it and mem leakage. Other it works very well. https://stackoverflow.com/questions/51267133/memory-continually-increasing-when-iterating-in-tensorflow-eager-execution. I cannot remember if you where able to watch your memory change over time. That’s important if possible. One thing you might try is the profiler https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras. Check this article https://medium.com/the-artificial-impostor/tensorflow-profiler-with-custom-training-loop-d5d4d97d2c89.

Anyways I think you need to mem profiling and maybe just running the detection_demo.py while profiling.

My understanding is this is for TF 2,2+ it at least works on tf 2.3 https://www.tensorflow.org/api_docs/python/tf/config/experimental/set_memory_growth

tf.config.experimental.set_memory_growth()

The Xavier should be ok for awhile even if there3 are mem issues. That could help you profile. Im going to try to run this code on my nano this weekend if I have time.

From: derm1ch1 notifications@github.com Sent: Sunday, September 20, 2020 3:21 PM To: theAIGuysCode/yolov4-deepsort yolov4-deepsort@noreply.github.com Cc: robisen1 robisen@gmail.com; Mention mention@noreply.github.com Subject: Re: [theAIGuysCode/yolov4-deepsort] Run on the Jetson Nano (#5)

Dear @robisen1 https://github.com/robisen1 ,

Answer 1

Can you try running it with -mpdb ? when it starts up use the “cont” so it will run until there is some sort of error. If it stops, like to you mention, you should then be able to inspect variables and the like. Check out https://realpython.com/python-debugging-pdb/ https://realpython.com/python-debugging-pdb/ . This may not be helpful but it is useful to discount various issues. Logs will be useful at this point and dmsg. Could you share?

Alright, got it! Main error seems to be:

2020-09-20 23:22:46.363942: F tensorflow/core/kernels/resize_bilinear_op_gpu.cu.cc:493] Non-OK-status: GpuLaunchKernel(kernel, config.block_count, config.thread_per_block, 0, d.stream(), config.virtual_thread_count, images.data(), height_scale, width_scale, batch, in_height, in_width, channels, out_height, out_width, output.data()) status: Internal: too many resources requested for launch

Fatal Python error: Aborted

Nevertheless I posted the whole error message here https://pastebin.pl/view/3ba2052d .

Answer 2

Also please try allow_growth which essentially assigns very little mem to tensorflow and then scales up as needed. Example:

Thanks for the provided code-example! Unfortunately I already tried it without success + in addition with config.gpu_options.per_process_gpu_memory_fraction = 0.333 Both leads to better performance in terms of "the script fails at a later point" - but doesn't solve it as far as I'm concerned :/

But just to be clear: Is there a difference between config.gpu_options.allow_growth = True And tf.config.experimental.set_memory_growth() --> I thought first is for tf 1.x and second is for tf 2.x - is it possible I did something wrong here? I experience a clear difference with and without using this config though.

Answer 3

This is very strange. Try the above options to constrain memory and see what happens but I think maybe you have more issues than just mem. The code you are using its just a straight clone of the repo right? Also getting logs is very important. There is a small chance gstreamer is causing issues but it’s a very small chance.

Yes, it's a complete clone - but it may be an approach to clone the repo again and try a fresh attempt? Maybe something broke while I was trying to fix the the memory-issue. I don't think so, but for problem delimitation I will do so and give feedback.

Answer 4

The Jetson TX2 should be able to run all of this. I have run it on the TX2 as well as another one that is a real memory pig and uses up a lot of file space. That being said the Xavier is 10 faster than TX2!! It should be able to, if you use tiny, get to 20+ plus frames depending on what you do. On the tx2 I get about 9 FPS. The 10 speed increase is not linear so I cannot really say how much faster but it should be a nice speed increase. That being said I am not sure your issue is hardware. Maybe try the above steps and see?

Can't stress enough how optimistic this makes me! :D With this in mind, I would nevertheless love to get this running on a Jetson Nano! As long as you provide me with this great ideas and thoughts! I'm very sure that there is a memory issue (just by considering the error messages I got over the last week) - but maybe that's not a dealbreaker or the striking point? Looking forward to the Xavier NX now - but would still try everything for the Jetson Nano!

Answer 5

Also.. if your really stuck I could probably do a install and write down every step. In fact.. I am sort of going to do that soon anyways. Please try the suggestions above first though.

You're such a blessing! I would really love to see this running on a Jetson Nano and am very very thankful for all your comments!

— You are receiving this because you were mentioned. Reply to this email directly, https://github.com/theAIGuysCode/yolov4-deepsort/issues/5#issuecomment-695843693 view it on GitHub, or https://github.com/notifications/unsubscribe-auth/ABSF2ROCP4Y76NSIITXSSDLSGZ545ANCNFSM4QJD3UVQ unsubscribe. https://github.com/notifications/beacon/ABSF2RIRMU7DXKP4SHA6773SGZ545A5CNFSM4QJD3UV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOFF43W3I.gif

robisen1 commented 3 years ago

btw i did try to do a install on my nano. it is not working right. i need to reflash

MuhammadAsadJaved commented 3 years ago

Hi guys @robisen1 @derm1ch1 Have you manged to run on Xavier NX or Jetson TX2? if yes then what about the speed?

Is There any update to accelerate with the TensorRT?

derm1ch1 commented 3 years ago

@robisen1 just saw my last answer wasn't posted - so sorry, didn't noticed it!!

I have tried in vain to make it work on the Nano. In the meantime I got the Xavier NX. The FPS are about 4 - which is far below my expectations.

But since my project runs with "1 frame per minute", it doesn't matter in my specific case. In fact the program runs continuously on the Xavier NX. All free memory is allocated right at the beginning (despite memory.growth and memory-limit) - but the program has been running without errors for days now.

@MuhammadAsadJaved I couldn't get it to run on the Nano, but it runs reliably on the Xavier NX with about 4 fps! Please consider that it might have to do with the implemented DeepSort algorithm, since Keras-Yolo even ran on my Jetson nano with more than 9 fps. :)

GeekAlexis commented 3 years ago

Hi guys, if you are looking for a highly optimized Deep SORT and YOLOv4 with TensorRT acceleration, here is my implementation: https://github.com/GeekAlexis/FastMOT. Let me know if you can get it to run and please star the repo! The FPS on my Xavier NX is 20+ on average.

MuhammadAsadJaved commented 3 years ago

@robisen1 You are right. You are lucky that your project do not require fast speed. Speed is the biggest problem for me now. tried several methods but still very slow.

@GeekAlexis Thanks great. Let me have a try.

ronger-git commented 3 years ago

Have you tried running in the cuda10.0 environment? My GPU only allows me to install cuda10.0, but it always reports errors. The most recent error is when running the save_model.py. TypeError: Expected list for 'ksize' argument to 'max_pool' Op, not 13. 2020-10-13 16-39-09屏幕截图 As a beginner, looking forward to your reply！

MuhammadAsadJaved commented 3 years ago

@ronger-git Hi there, It's not cuda problem. it's TensorFlow problem. Can you show your TensorFlow version? You can check by running int he terminal

python -c 'import tensorflow as tf; print(tf.__version__)'  # for Python 2
python3 -c 'import tensorflow as tf; print(tf.__version__)'  # for Python 3

ronger-git commented 3 years ago

@MuhammadAsadJaved That TensorFlow is 1.13.1. But when I use tensorflow2.0.0, there is a new error! 2020-10-14 10-52-08屏幕截图 This may still be a TensorFlow version problem. Which TensorFlow version have you tried？

ronger-git commented 3 years ago

@MuhammadAsadJaved HI！I have resolved this version issue, thanks for your reply.

Elina-ye commented 3 years ago

https://forums.developer.nvidia.com/t/tiny-yolov4-tensorrt-too-many-resources-requested-for-launch-on-4gb-nano/163962/9

""" Adding the following lines to the top of the detect script (before TensorFlow imports) resolves the issue: import os os.environ['CUDA_VISIBLE_DEVICES'] = '1' """ It works ！ ps: tesorflow_gpu=2.3.0, python3.6.9, Jetson Nano, yolov4-obj-tiny(trained by myself)

assulthoni commented 3 years ago

https://forums.developer.nvidia.com/t/tiny-yolov4-tensorrt-too-many-resources-requested-for-launch-on-4gb-nano/163962/9

""" Adding the following lines to the top of the detect script (before TensorFlow imports) resolves the issue: import os os.environ['CUDA_VISIBLE_DEVICES'] = '1' """ It works ！ ps: tesorflow_gpu=2.3.0, python3.6.9, Jetson Nano, yolov4-obj-tiny(trained by myself)

i think this code is used for hide device 0 in cuda

theAIGuysCode / yolov4-deepsort

Run on the Jetson Nano #5

save yolov4-tiny model

Run yolov4-tiny object tracker

Answer 1

Answer 2

Answer 3

Answer 4

Answer 5