nlamprian / ICP

Implementation of the photogeometric ICP algorithm in OpenCL
MIT License
15 stars 7 forks source link

CL_INVALID_PROGRAM_EXECUTABLE #3

Open drhalftone opened 8 years ago

drhalftone commented 8 years ago

Finally got everything to compile, but when I run the tests, I receive the following output:

DrHalftone:ICP-build dllau$ ./bin/icp_tests_icp --profiling [==========] Running 11 tests from 1 test case. [----------] Global test environment set-up. [----------] 11 tests from ICP [ RUN ] ICP.getLMs clEnqueueNDRangeKernel (CL_INVALID_PROGRAM_EXECUTABLE)

I'm running on a Mac Book Pro running 10.11.1. Any help would be greatly appreciated.

nlamprian commented 8 years ago

Hello drhalftone,

Thank you for all your feedback. Soon, I'll start working again on the related projects, and I'll apply the fixes.

At the moment, I'm not able to run any OpenCL code, so I can't do much for you. But let's give it a try:

Please, compile CLUtils and run its tests. Is everything OK there?

drhalftone commented 8 years ago

DrHalftone:build dllau$ ./bin/clutils_vecAdd Out of Range error: unordered_map::at: key not found (/Users/dllau/SourceTree/CLUtils/src/CLUtils.cpp:420) DrHalftone:build dllau$ ./bin/clutils_tests [==========] Running 5 tests from 4 test cases. [----------] Global test environment set-up. [----------] 2 tests from CLEnv [ RUN ] CLEnv.BasicFunctionality Out of Range error: unordered_map::at: key not found (/Users/dllau/SourceTree/CLUtils/src/CLUtils.cpp:420)

The above shows you what I get when I run the vecAdd as well as the test executables.

On Dec 6, 2015, at 7:37 AM, Nick Lamprianidis notifications@github.com wrote:

Hello drhalftone,

Thank you for all your feedback. Soon, I'll start working again on the related projects, and I'll apply the fixes.

At the moment, I'm not able to run any OpenCL code, so I can't do much for you. But let's give it a try:

Please, compile CLUtils and run its tests. Is everything OK there?

— Reply to this email directly or view it on GitHub https://github.com/pAIgn10/ICP/issues/3#issuecomment-162311761.

drhalftone commented 8 years ago

To follow up my emails, I just wanted to say that I really want to help debug this code for two reasons. The first is that I’ve been doing a lot of opengl shader language programming, and now, I’m eager to learn how to do OpenCL programming. Also, I have been working for the past year to develop software for interfacing with RGB+D cameras. The code I have sends the video to my shader language code, which outputs XYZW+RGBA as a floating point buffer. Those buffers are then processed using libpointmatcher, which is a CPU implementation of ICP. I’ve been able to get it to run in real-time, but it takes up all my CPU cores to do so. It also doesn’t compile on Windows. So I’m looking for a GPU implementation of ICP that I can incorporate into my RGB+D video processing application.

That all being said, I’m a Qt programmer. So I’m importing your CMake stuff into Qt Creator. I can compile the CLUtils project, and I can edit the source code when a compiler error comes up. I can run the code, but I can’t run the debugger. So when I see an error, I can’t stop the code at the offending line and figure out what’s wrong. Perhaps you could instruct me on how to get my debugger to work with your code inside Qt Creator, and then I can contribute some meaningful fixes.

Dr. Daniel L. Lau, Professor and Certified Professional Engineer Department of Electrical and Computer Engineering University of Kentucky Lexington, KY 40506-0046

office: (859) 257-1787 fax: (859) 257-3092 cell: (859) 312-8047 web: http://www.engr.uky.edu/~dllau

dllau@engr.uky.edumailto:dllau@engr.uky.edu

P Please consider the environment before printing this email.

CONFIDENTIALITY NOTICE: The information in this email may be confidential and/or privileged. This email is intended to be reviewed by only the individual or organization named above. If you are not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any review, dissemination, or copying of this email and its attachments, if any, or the information contained herein is prohibited. If you have received this email in error, please immediately notify the sender by return email and delete this email from your system.

nlamprian commented 8 years ago

It's a problem with the kernel files. Try calling the readSource function and verify that you are able to access the files and the code is loaded properly. Beyond that point, I don't expect any errors since it's straightforward OpenCL workflow. If the problem persists, walk through the CLEnv constructor in CLUtils.cpp and check the intermediate variables.

As for debugger, I used CodeXL, only for profiling. I think it's available only on Windows/Linux.

OpenCL is the way to go for these kinds of things. Currently, there is too little work done on the CPU (e.g. Kinect library is by far the biggest load), but the communication with the CPU at every iteration of the ICP limits the performance. There are things I plan to do to make the process more accurate and faster, but the biggest change will come when I'll support OpenCL 2.0.

drhalftone commented 8 years ago

Out of curiosity, I noticed that you posted that the ICP implementation completes one iteration per 1.1msec for the given point cloud parameters. I’m not really sure I understand the meaning of cloud size parameters. I will read that paper in greater detail to figure that out. But in the mean time, can you post how long it took to merge just the two frames of RGB+D video that you show in Github? In other words, if I just plug my prime sense camera into your code and then slowly swing the camera from side to side, about how many frames per second do you think I can maintain with this code?

On Dec 6, 2015, at 10:39 AM, Nick Lamprianidis notifications@github.com wrote:

It's a problem with the kernel files. Try calling the readSource function and verify that you are able to access the files and the code is loaded properly. Beyond that point, I don't expect any errors since it's straightforward OpenCL workflow. If the problem persists, walk through the CLEnv constructor in CLUtils.cpp and check the intermediate variables.

As for debugger, I used CodeXL, only for profiling. I think it's available only on Windows/Linux.

OpenCL is the way to go for these kinds of things. Currently, there is too little work done on the CPU (e.g. Kinect library is by far the biggest load), but the communication with the CPU at every iteration of the ICP limits the performance. There are things I plan to do to make the process more accurate and faster, but the biggest change will come when I'll support OpenCL 2.0.

— Reply to this email directly or view it on GitHub https://github.com/pAIgn10/ICP/issues/3#issuecomment-162324187.

drhalftone commented 8 years ago

I’ve switched to XCode, so now I can now run a debugger. I’d love to know how to set up Qt Creator to debug CMake files, so maybe another Qt expert will chime in.

On Dec 6, 2015, at 10:39 AM, Nick Lamprianidis notifications@github.com wrote:

It's a problem with the kernel files. Try calling the readSource function and verify that you are able to access the files and the code is loaded properly. Beyond that point, I don't expect any errors since it's straightforward OpenCL workflow. If the problem persists, walk through the CLEnv constructor in CLUtils.cpp and check the intermediate variables.

As for debugger, I used CodeXL, only for profiling. I think it's available only on Windows/Linux.

OpenCL is the way to go for these kinds of things. Currently, there is too little work done on the CPU (e.g. Kinect library is by far the biggest load), but the communication with the CPU at every iteration of the ICP limits the performance. There are things I plan to do to make the process more accurate and faster, but the biggest change will come when I'll support OpenCL 2.0.

— Reply to this email directly or view it on GitHub https://github.com/pAIgn10/ICP/issues/3#issuecomment-162324187.

drhalftone commented 8 years ago

Okay, I changed the name of the files to include their absolute paths. So they are being loaded. The issue now is that the CLUtil constructor is extracting the kernel name as “initRand\0” instead of “initRand”. So this line of code:

        unsigned int kIdx = kernelIdx.at (pgIdx).at (std::string (kernel_name));

is throwing an exception when the two strings don’t match.

On Dec 6, 2015, at 10:39 AM, Nick Lamprianidis notifications@github.com wrote:

It's a problem with the kernel files. Try calling the readSource function and verify that you are able to access the files and the code is loaded properly. Beyond that point, I don't expect any errors since it's straightforward OpenCL workflow. If the problem persists, walk through the CLEnv constructor in CLUtils.cpp and check the intermediate variables.

As for debugger, I used CodeXL, only for profiling. I think it's available only on Windows/Linux.

OpenCL is the way to go for these kinds of things. Currently, there is too little work done on the CPU (e.g. Kinect library is by far the biggest load), but the communication with the CPU at every iteration of the ICP limits the performance. There are things I plan to do to make the process more accurate and faster, but the biggest change will come when I'll support OpenCL 2.0.

— Reply to this email directly or view it on GitHub https://github.com/pAIgn10/ICP/issues/3#issuecomment-162324187.

nlamprian commented 8 years ago

The repo's description was written early in the development and I haven't updated it. Some further tests have shown that an ICP iteration takes about 1.3msec. This number alone can be misleading. I'll explain below. The mean time for the ICP to complete is 60msec. I was able to run OCLSLAM (which performs ICP and builds a map) at 10Hz.

Practically, the numbers vary greatly. The time it takes to perform the ICP algorithm depends on the surfaces on the scene, the number of points chosen for each image, how the points were chosen, how accurate the nearest neighbor search is, what is the initial transformation. Also, from the perspective of the application running, there is a lot of dead time. If it needs 20 ICP iterations to complete, the time it really takes is greater than 20*1.3msec. The same goes between consecutive ICP alignments. Let's lump all these delays and attribute them to system load. So, the 1.3msec time for the ICP iteration, it actually says nothing about the actual system performance.

The set cardinalities (|F|=|M|) in the description refer to the number of the points used in the ICP calculation, and the representative points (|R|) are related to the RBC data structure.

drhalftone commented 8 years ago

Okay, it took some work because I’m not familiar with the standard template libraries, since Qt provides their own versions of this library. Basically, I had to delete the \0 from the character strings you were getting from the kernel files. I found an example online, so forgive me if its not an efficient means of remove the character. Here is what my loop looked like in the constructor of the CLUtil class:

        // Retrieve the kernels from program 0
        kernels.emplace_back ();
        kernelIdx.emplace_back ();
        for (unsigned int idx = 0; idx < kernel_names.size (); ++idx)
        {
            kernel_names[idx].erase(std::remove(kernel_names[idx].begin(), kernel_names[idx].end(), '\0'), kernel_names[idx].end());
            kernels[0].emplace_back (programs[0], kernel_names[idx].c_str ());
            kernelIdx[0][kernel_names[idx]] = idx;
        }

On Dec 6, 2015, at 1:05 PM, Nick Lamprianidis notifications@github.com wrote:

The repo's description was written early in the development and I haven't updated it. Some further tests have shown that an ICP iteration takes about 1.3msec. This number alone can be misleading. I'll explain below. The mean time for the ICP to complete is 60msec. I was able to run OCLSLAM (which performs ICP and builds a map) at 10Hz.

Practically, the numbers vary greatly. The time it takes to perform the ICP algorithm depends on the surfaces on the scene, the number of points chosen for each image, how the points were chosen, how accurate the nearest neighbor search is, what is the initial transformation. Also, from the perspective of the application running, there is a lot of dead time. If it needs 20 ICP iterations to complete, the time it really takes is greater than 20*1.3msec. The same goes between consecutive ICP alignments. Let's lump all these delays and attribute them to system load. So, the 1.3msec time for the ICP iteration, it actually says nothing about the actual system performance.

The set cardinalities (|F|=|M|) in the description refer to the number of the points used in the ICP calculation, and the representative points (|R|) are related to the RBC data structure.

— Reply to this email directly or view it on GitHub https://github.com/pAIgn10/ICP/issues/3#issuecomment-162332607.

drhalftone commented 8 years ago

Here is my output:

[==========] Running 5 tests from 4 test cases. [----------] Global test environment set-up. [----------] 2 tests from CLEnv [ RUN ] CLEnv.BasicFunctionality unknown file: Failure C++ exception with description "clEnqueueNDRangeKernel" thrown in the test body. [ FAILED ] CLEnv.BasicFunctionality (115 ms) [ RUN ] CLEnv.AddMoreCLObjects [ OK ] CLEnv.AddMoreCLObjects (67 ms) [----------] 2 tests from CLEnv (182 ms total)

[----------] 1 test from ProfilingInfo [ RUN ] ProfilingInfo.BasicFunctionality [ OK ] ProfilingInfo.BasicFunctionality (0 ms) [----------] 1 test from ProfilingInfo (0 ms total)

[----------] 1 test from CPUTimer [ RUN ] CPUTimer.BasicFunctionality /Users/dllau/SourceTree/CLUtils/tests/tests.cpp:170: Failure Expected: (timer.duration () - 100000) <= (1000), actual: 2794.41 vs 1000 [ FAILED ] CPUTimer.BasicFunctionality (103 ms) [----------] 1 test from CPUTimer (103 ms total)

[----------] 1 test from GPUTimer [ RUN ] GPUTimer.BasicFunctionality [ OK ] GPUTimer.BasicFunctionality (11 ms) [----------] 1 test from GPUTimer (12 ms total)

[----------] Global test environment tear-down [==========] 5 tests from 4 test cases ran. (297 ms total) [ PASSED ] 3 tests. [ FAILED ] 2 tests, listed below: [ FAILED ] CLEnv.BasicFunctionality [ FAILED ] CPUTimer.BasicFunctionality

2 FAILED TESTS Program ended with exit code: 1

On Dec 6, 2015, at 1:05 PM, Nick Lamprianidis notifications@github.com wrote:

The repo's description was written early in the development and I haven't updated it. Some further tests have shown that an ICP iteration takes about 1.3msec. This number alone can be misleading. I'll explain below. The mean time for the ICP to complete is 60msec. I was able to run OCLSLAM (which performs ICP and builds a map) at 10Hz.

Practically, the numbers vary greatly. The time it takes to perform the ICP algorithm depends on the surfaces on the scene, the number of points chosen for each image, how the points were chosen, how accurate the nearest neighbor search is, what is the initial transformation. Also, from the perspective of the application running, there is a lot of dead time. If it needs 20 ICP iterations to complete, the time it really takes is greater than 20*1.3msec. The same goes between consecutive ICP alignments. Let's lump all these delays and attribute them to system load. So, the 1.3msec time for the ICP iteration, it actually says nothing about the actual system performance.

The set cardinalities (|F|=|M|) in the description refer to the number of the points used in the ICP calculation, and the representative points (|R|) are related to the RBC data structure.

— Reply to this email directly or view it on GitHub https://github.com/pAIgn10/ICP/issues/3#issuecomment-162332607.

drhalftone commented 8 years ago

Okay, CLUtils to pass all tests. All that I had to do was explicitly modify the code to only create GPU devices. Here is what I am now getting for output:

[==========] Running 5 tests from 4 test cases. [----------] Global test environment set-up. [----------] 2 tests from CLEnv [ RUN ] CLEnv.BasicFunctionality [ OK ] CLEnv.BasicFunctionality (83 ms) [ RUN ] CLEnv.AddMoreCLObjects [ OK ] CLEnv.AddMoreCLObjects (81 ms) [----------] 2 tests from CLEnv (164 ms total)

[----------] 1 test from ProfilingInfo [ RUN ] ProfilingInfo.BasicFunctionality [ OK ] ProfilingInfo.BasicFunctionality (0 ms) [----------] 1 test from ProfilingInfo (0 ms total)

[----------] 1 test from CPUTimer [ RUN ] CPUTimer.BasicFunctionality [ OK ] CPUTimer.BasicFunctionality (100 ms) [----------] 1 test from CPUTimer (100 ms total)

[----------] 1 test from GPUTimer [ RUN ] GPUTimer.BasicFunctionality [ OK ] GPUTimer.BasicFunctionality (14 ms) [----------] 1 test from GPUTimer (14 ms total)

[----------] Global test environment tear-down [==========] 5 tests from 4 test cases ran. (279 ms total) [ PASSED ] 5 tests. Program ended with exit code: 0

On Dec 6, 2015, at 1:05 PM, Nick Lamprianidis notifications@github.com wrote:

The repo's description was written early in the development and I haven't updated it. Some further tests have shown that an ICP iteration takes about 1.3msec. This number alone can be misleading. I'll explain below. The mean time for the ICP to complete is 60msec. I was able to run OCLSLAM (which performs ICP and builds a map) at 10Hz.

Practically, the numbers vary greatly. The time it takes to perform the ICP algorithm depends on the surfaces on the scene, the number of points chosen for each image, how the points were chosen, how accurate the nearest neighbor search is, what is the initial transformation. Also, from the perspective of the application running, there is a lot of dead time. If it needs 20 ICP iterations to complete, the time it really takes is greater than 20*1.3msec. The same goes between consecutive ICP alignments. Let's lump all these delays and attribute them to system load. So, the 1.3msec time for the ICP iteration, it actually says nothing about the actual system performance.

The set cardinalities (|F|=|M|) in the description refer to the number of the points used in the ICP calculation, and the representative points (|R|) are related to the RBC data structure.

— Reply to this email directly or view it on GitHub https://github.com/pAIgn10/ICP/issues/3#issuecomment-162332607.

soulslicer commented 7 years ago

Hi was this change pushed to the main branch? Perhaps that is why I am getting my current issue?

drhalftone commented 7 years ago

Now, I kept the changes to myself, and hoped that the maintainer would make the corrections.

On Oct 19, 2016, at 10:19 AM, Raaj <notifications@github.com mailto:notifications@github.com> wrote:

Hi was this change pushed to the main branch?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pAIgn10/ICP/issues/3#issuecomment-254826943, or mute the thread https://github.com/notifications/unsubscribe-auth/AKL1ovLLb6R4ftvW-kCQkduiH8OFOs7fks5q1ib4gaJpZM4Gvk-t.

soulslicer commented 7 years ago

Would you be able to send that codebase to my email?