yuxng / DA-RNN

Semantic Mapping with Data Associated Recurrent Neural Networks
MIT License
170 stars 72 forks source link

python:free() invalid pointer when set is_kfusion=True #9

Open JackHenry1992 opened 7 years ago

JackHenry1992 commented 7 years ago

I have ran successful your code with set is_kfusion=false. Now I want to ran your kinect_fusion.cpp with set this flag to true, but I got error image Have you encoutered same error as me? Could you give some suggestions? In order to avoid pangolin error, I have comment all pangolin code in kinect_fusion.cpp

Supplement: I also got python: free(): invalid next size (fast) if run test_kinect_fusion.sh on native notebook,, and found that code crash down in initMarchingCubesTables() of create_tensors() by std::cout info image

Can you give more methods to test kinectfusion code (like Video$1.pango dataset in kinect_fusion/run.sh)?

Another try I have modified kinect_fusion.cpp/main() image-input-interface and use cv2.imread to replace VideoInput as follows image Then direct run main() function by cmd and error shows cuda_error in initMatchingCubes() image

It seems that this error is same with running test_kinect_fusion.py. So all this errors caused by cuda? The CUDA version I installed is cuda-8.0

kevinkit commented 7 years ago

Have you tried it with Python > 3 ?

JackHenry1992 commented 7 years ago

Hi, @kevinkit , thank you very much , I will try it later. Have you run DA-RNN successful with kinect_fusion ? Another error I have encoutered is that LD_PRELOAD can not found libtcmalloc.so.4, could you also give some suggestion?

patrickESM commented 7 years ago

I had this issue, when running under ubuntu 14.04 , are you running it on ubuntu 16.04?

kevinkit commented 7 years ago

@JackHenry1992 Like @D0nBilb0 said, this was our case, and no we are still stuck on #7 , even on a native machine

JackHenry1992 commented 7 years ago

@D0nBilb0 , I am running on ubuntu16.04 docker container and encoutered this error. And also trying it on official tensorflow docker (ubuntu16.04, python2), same error about free() invalid pointer. Then I run test_kinect_fusion.sh on my native notebook (ubuntu14.04, python2). _Native notebook can build kinect_fusion ok, but the same error of python free(): invalid next size when run test_kinectfusion.sh script. But have not test on native notebook of ubuntu16.04.

@kevinkit , after configured python3 and tried building this setup.py, I got some errors that show this code is python2 style. Another things is that I can build #7 successful on native computer, but I can't run da-rnn training caused OOM, so if you have enough GPU memory on your native machine, I think you can run it ok.

@yuxng , it will be very grateful to us for your advice, is free() invalid pointer caused by tcmalloc ? Can you give more methods to test kinectfusion code (like Video$1.pango dataset in kinect_fusion/run.sh)?

kevinkit commented 6 years ago

@JackHenry1992 regarding to your free() problem, have you checked the tensorflow version #2 ?

Can you maybe give all the steps needed to get it to run on ubuntu 14.04 , we tried that - however we came across many things that needed to be changed, I opened another Issue for that: #10

JackHenry1992 commented 6 years ago

@kevinkit , I just run test_kinect_fusion.py (don't run tensorflow) and also got error. And try direct run executable file (build/kinectFusion) also got error , seems that cuda run error. image

kevinkit commented 6 years ago

What is the compute capablity of your GPU? I read that in some cases textures may not work on smaller compute capablities

JackHenry1992 commented 6 years ago

This is my notebook gpu params image

kevinkit commented 6 years ago

You can access the details, e.g. compute capability under: https://developer.nvidia.com/cuda-gpus. Your GPU (GeForceGTX 960 M) has a compute capability of 5.0 , a good look what this gpu supports is given here: https://en.wikipedia.org/wiki/CUDA . There are some drawbacks reagring textures with this compute capability (Cache working set per multiprocessor for texture memor,...), , that may not happen at a higher compute capability ( @yuxng used a Titan 1080, which has compute capablity 6) - however this may not be the source of error

kevinkit commented 6 years ago

@JackHenry1992 refering to your problem with LD_PRELOAD, we get the same error but only as a warning. However, when we tried to start the scripts there were other dependencies which needed to be installed, too (opencv, scipy, Pillow, yaml)

pip install scipy pip install opencv-python pip install Pillow pip install pyyaml

yuxng commented 6 years ago

Using tcmalloc speeds the tensorflow training. Otherwise, I saw tensorflow slow down after iterations. However, I also see that using tcmalloc in testing crashed Pangolin. So you can disable tcmalloc when you run kinect fusion in testing.

kevinkit commented 6 years ago

Thank you for your reply, we ran into similiar errors when trying to run the test script, can you tell how to disable tcmalloc when runing your test scripts?

We ran into the same error, with kinect_fusion enabled

yuxng commented 6 years ago

Do NOT issue the command "export LD_PRELOAD=/usr/lib/libtcmalloc.so.4" when you run the scirpt.

kevinkit commented 6 years ago

So basically, if my LD_PRELOAD is empty, I should be good to go?

If I simply run

./experiments/scripts/rgbd_scene_multi_rgbd_test.sh 0

which does NOT issue the command...

In a new terminal (so LD_PRELOAD was NOT set by anything before), I still get free() invalid pointer error.

JackHenry1992 commented 6 years ago

I have tried DA-RNN in GeoForce 1050(ubuntu16.04), which compability>6, and cannot run kinectfusion...

@yuxng , TITAN x gets the same error (run test_kinect_fusion.py). Can you give details method to use kinect_fusion code? Or the videoinput dataset

Wei2624 commented 6 years ago

Hi @JackHenry1992 , I am wondering if you have solved free() issue that you mentioned. I am trying to reproduce the framework and encountered the same problem. Any comments are appreciated. .

Dinghow commented 5 years ago

Hi @JackHenry1992 , I am wondering if you have solved free() issue that you mentioned. I am trying to reproduce the framework and encountered the same problem. Any comments are appreciated. .

I encountered the same problem too. My GPU is RTX2080Ti with compute capability 7.5