wimrijnders / V3DLib

C++ library for programming the VideoCore GPU on all Raspberry Pi's.
Other
116 stars 22 forks source link

Failed to build on Raspberry PI buster - 64 bits #2

Closed doleron closed 3 years ago

doleron commented 3 years ago

Hi!

I was able to build/run the examples using buster 32 bits smoothly. However, when I tried to compile on Raspberry OS buster 64 bits I got the following output:

pi@raspberrypi:~ $ uname -m
aarch64
pi@raspberrypi:~ $ gcc --version
gcc (Debian 8.3.0-6) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

pi@raspberrypi:~ $ git clone --depth 1 https://github.com/wimrijnders/V3DLib.git
Cloning into 'V3DLib'...
remote: Enumerating objects: 770, done.
remote: Counting objects: 100% (770/770), done.
remote: Compressing objects: 100% (672/672), done.
remote: Total 770 (delta 86), reused 636 (delta 83), pack-reused 0
Receiving objects: 100% (770/770), 4.54 MiB | 4.67 MiB/s, done.
Resolving deltas: 100% (86/86), done.
pi@raspberrypi:~ $ cd V3DLib/
pi@raspberrypi:~/V3DLib $ script/install.sh 
Cloning into 'CmdParameter'...
remote: Enumerating objects: 199, done.
remote: Counting objects: 100% (199/199), done.
remote: Compressing objects: 100% (107/107), done.
remote: Total 620 (delta 127), reused 140 (delta 85), pack-reused 421
Receiving objects: 100% (620/620), 224.67 KiB | 824.00 KiB/s, done.
Resolving deltas: 100% (393/393), done.
Already on 'master'
Your branch is up to date with 'origin/master'.
From https://github.com/wimrijnders/CmdParameter
 * branch            master     -> FETCH_HEAD
Already up to date.
rm -rf obj obj-debug generated
Compiling Lib/TypedParameter.cpp
Compiling Lib/Types/Types.cpp
Compiling Lib/Types/NoneParameter.cpp
Compiling Lib/Types/IntParameter.cpp
Compiling Lib/Types/OptionParameter.cpp
Compiling Lib/Types/StringParameter.cpp
Compiling Lib/Types/UnnamedParameter.cpp
Compiling Lib/Types/UnsignedIntParameter.cpp
Compiling Lib/Types/PositiveIntParameter.cpp
Compiling Lib/Types/PositiveFloatParameter.cpp
Compiling Lib/DefAction.cpp
Compiling Lib/CmdParameters.cpp
Compiling Lib/DefParameter.cpp
Compiling Lib/CmdValidation.cpp
Creating obj-debug/libCmdParameter.a
Compiling Examples/Simple.cpp
Linking obj-debug/bin/Simple...
Compiling Examples/Actions.cpp
Linking obj-debug/bin/Actions...
Compiling Lib/TypedParameter.cpp
Compiling Lib/Types/Types.cpp
Compiling Lib/Types/NoneParameter.cpp
Compiling Lib/Types/IntParameter.cpp
Compiling Lib/Types/OptionParameter.cpp
Compiling Lib/Types/StringParameter.cpp
Compiling Lib/Types/UnnamedParameter.cpp
Compiling Lib/Types/UnsignedIntParameter.cpp
Compiling Lib/Types/PositiveIntParameter.cpp
Compiling Lib/Types/PositiveFloatParameter.cpp
Compiling Lib/DefAction.cpp
Compiling Lib/CmdParameters.cpp
Compiling Lib/DefParameter.cpp
Compiling Lib/CmdValidation.cpp
Creating obj/libCmdParameter.a
Compiling Examples/Simple.cpp
Linking obj/bin/Simple...
Compiling Examples/Actions.cpp
Linking obj/bin/Actions...
pi@raspberrypi:~/V3DLib $ script/gen.sh 
pi@raspberrypi:~/V3DLib $ make QPU=1 DEBUG=1 all 
Building for QPU
Building on a Pi platform
Compiling Lib/vc4/RegisterMap.cpp
Compiling Lib/vc4/RegAlloc.cpp
Compiling Lib/vc4/SourceTranslate.cpp
Compiling Lib/vc4/Invoke.cpp
In file included from Lib/Common/BufferObject.h:6,
                 from Lib/Common/SharedArray.h:4,
                 from Lib/vc4/Invoke.h:5,
                 from Lib/vc4/Invoke.cpp:1:
Lib/Support/HeapManager.h: In member function ‘int V3DLib::HeapManager::num_free_ranges() const’:
Lib/Support/HeapManager.h:22:57: warning: conversion from ‘std::vector<V3DLib::HeapManager::FreeRange>::size_type’ {aka ‘long unsigned int’} to ‘int’ may change value [-Wconversion]
  int num_free_ranges() const { return m_free_ranges.size(); }
                                       ~~~~~~~~~~~~~~~~~~^~
Lib/vc4/Invoke.cpp: In function ‘void V3DLib::invoke(int, V3DLib::SharedArray<unsigned int>&, int, V3DLib::Seq<int>*)’:
Lib/vc4/Invoke.cpp:56:47: error: cast from ‘uint32_t*’ {aka ‘unsigned int*’} to ‘uint32_t’ {aka ‘unsigned int’} loses precision [-fpermissive]
     codeMem[offset++] = (uint32_t) paramsPtr[i];
                                               ^
Lib/vc4/Invoke.cpp:57:36: error: cast from ‘uint32_t*’ {aka ‘unsigned int*’} to ‘uint32_t’ {aka ‘unsigned int’} loses precision [-fpermissive]
     codeMem[offset++] = (uint32_t) qpuCodePtr;
                                    ^~~~~~~~~~
Lib/vc4/Invoke.cpp:63:57: error: cast from ‘uint32_t*’ {aka ‘unsigned int*’} to ‘uint32_t’ {aka ‘unsigned int’} loses precision [-fpermissive]
   unsigned result = execute_qpu(mb, numQPUs, (uint32_t) launchMsgsPtr, 1, QPU_TIMEOUT);
                                                         ^~~~~~~~~~~~~
In file included from Lib/vc4/Invoke.h:5,
                 from Lib/vc4/Invoke.cpp:1:
Lib/Common/SharedArray.h: In instantiation of ‘T* V3DLib::SharedArray<T>::getPointer() [with T = unsigned int]’:
Lib/vc4/Invoke.cpp:39:45:   required from here
Lib/Common/SharedArray.h:55:12: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     return (T *) m_phyaddr;
            ^~~~~~~~~~~~~~~
make: *** [Makefile:172: obj/qpu-debug/Lib/vc4/Invoke.o] Error 1
pi@raspberrypi:~/V3DLib $ 

As a quick workaround, I changed the makefile to include the -fpermissive flag and it allowed me to build/run just like on the 32 bit OS. But I think that it may not be the best solution. I`m investigating the code to check if it is possible to replace those int32 by something more portable.

wimrijnders commented 3 years ago

@doleron Yay! Milestone, my first issue. I will celebrate by fixing this.

I haven't done any work on 64-bits, you are a pioneer in this respect. Actually, thanks for looking into that. I'm sort of suprised that you did not get any more compile fallout.

It's possible to compile on a non-Pi platform where hardware GPU is disabled. I'll see if I can get rid of the messages in this way.

Actually, it might be a good idea to install a 64-bits OS myself. What distro did you use? And can you point me to the install instructions?

doleron commented 3 years ago

@wimrijnders cheers!

I think only a few users actually use Raspberry PI on a 64 bit OS. So, I think this issue is a small problem.

I'm using Raspberry PI OS (debian buster) 64-bit. I download the image from here: https://downloads.raspberrypi.org/raspios_arm64/

And used https://sourceforge.net/projects/win32diskimager/ to write the SD card.

wimrijnders commented 3 years ago

I'm currently building on an 64-bit debian on my Intel laptop. It won't run of course, but at least I can detect all issues concerning 64-bit compile.

Will try 64-bit raspbian later. I have some spare pi 4s left.

doleron commented 3 years ago

Hi @wimrijnders!

I'm digging in the code just a little. I managed to fix the compilation errors by doing a minimun change in file Lib/vc4/Invoke.cpp.

First, I commeted out the loop:

/*for (int i = 0; i < numQPUs; i++) {
    codeMem[offset++] = (uint32_t) paramsPtr[i];
    codeMem[offset++] = (uint32_t) qpuCodePtr;
}*/

and then I changed uint32_t to site_t as follows:

unsigned result = execute_qpu(mb, numQPUs, (uint32_t) launchMsgsPtr, 1, QPU_TIMEOUT);

to

unsigned result = execute_qpu(mb, numQPUs, (size_t) launchMsgsPtr, 1, QPU_TIMEOUT);

These modifications allowed me to build and run the tests successfully:

All tests passed (120822 assertions in 18 test cases)

I don't know if it can actually help out to provide a portable solution. I'm still learning from the code but I have noted that the native versions of mailbox.h in the 64-bit OS:

/opt/vc/src/hello_pi/hello_fft/mailbox.h
/usr/src/hello_pi/hello_fft/mailbox.h

are equal to the one in the 32-bit version.

They use unsigned 4-byte integers as parameters in the same way as the one in V3DLib uses - sizeof(unsigned) is 4 in either 32 and 64 OS).

I will try to read mailbox.cpp in details later on over this week. Maybe I can realize what must be keep as uint32 and what could be change to size_t.

wimrijnders commented 3 years ago

Thanks for looking into this. I'm quite surprised you're delving into the code already.

The bit you've commented out that initializes launchMsgsPtrs is definitely needed. I hope that you can reach an understanding of what is required.

Another thing i'm considering is getting rid of the mailbox altogether. In newer Raspbian distro's, you can run programs on the vc4 device driver (which is the VideoCore IV) via /dev/dri.

This is what is used for v3d/VideoCore VI. A nice advantage of this is that you don't need sudo to run any programs.

wimrijnders commented 3 years ago

the native versions of mailbox.h in the 64-bit OS...

Would you mind checking if that example actually runs on 64-bits? It would give me a bit more confidence.

Will look at installing 64-bit raspbian today.

wimrijnders commented 3 years ago

Also, do you actually need this code to run?

I you're running 64-bit Raspbian on a Pi 4, execute_qpu() is never called. This is only called for previous versions of the Pi. In that case, it can (for the time being) be dealt with by setting an error condition.

This is just a thought to save you time. Not sure how deep you want to go to solve this.

wimrijnders commented 3 years ago

@doleron I've added commit 8d883b2 to address 64-bit compilation issues. See how this works for you.

For the invoke() issue, I've added a check on it for compilation on any platform that is not ARM 32 bits:

#ifdef ARM32
  // ... original code
#else
  #pragma message("WARNING: invoke() will not run on this platform, only on ARM 32-bits")
  assertq(false, "invoke() will not run on this platform, only on ARM 32-bits");

  unsigned result = 1;  // Force error message
#endif

Feel free to continue working on this issue. If you don't, that is also fine with me.

doleron commented 3 years ago

Hi @wimrijnders ,

thanks for the explanation! It is clear for me now.

I confirm that, by using the last commit, everything compile and seem to run fine!

wimrijnders commented 3 years ago

Hooray! It was a bit of a stretch, to fix the compile messages on Intel 64 bit and then hope that it's also good for ARM 64 bit.

Feel free to close the issue if you are satisfied.

Thank you very much for your work. I'm very happy to have more eyeballs for the code. I sincerely hope that you stay involved.

doleron commented 3 years ago

Hooray! It was a bit of a stretch, to fix the compile messages on Intel 64 bit and then hope that it's also good for ARM 64 bit.

Yeahh, I see! Here we call it by "Van Damme's approach" Lol.

Feel free to close the issue if you are satisfied.

Ok, closed!

Thank you very much for your work. I'm very happy to have more eyeballs for the code. I sincerely hope that you stay involved.

You're welcome. Sure, count on me!

wimrijnders commented 3 years ago

@doleron Hereby informing you that I got Raspbian 64-bits installed. All unit tests and examples run. Raspbian 64-bits is now part of my unit testing cluster.

Unfortunately, it turns out that you need sudo in 64-bits as well. I was hoping to avoid this.

I fixed the warnings coming out of this.

64-bits runs quite smoothly, actually. It appears that it is mostly faster than 32-bits. In some isolated cases it feels slower. But it starts up really quickly.

doleron commented 3 years ago

Great news, @wimrijnders!

In general, at least what I have seen so far, buster 64 bit is faster than its 32-bit sibling. I also prefer to use 64-bit in my projects because everything else is 64-bit as well so less time to care about bit compability. It said Albert Einstein used to wear the same combination of clothes/shoes all the time in order to spend less time in chosen what to wear =D I think this approach may work for a mere human being like me as well.

I have a CPU intensive application processing 640x480 frames at 11 fps. The bottleneck resides in some large matrix computations (460x460 mamul 460x460). I use Eigen for this. I am willing to port these computations to V3DLib and leave the CPUs free to do other things. What do you think about it? Does make sense?

wimrijnders commented 3 years ago

WOW! That's a great application.

You just gave me a target: implementing large matrix multiplications on the GPU. Tell me how I can help. Perhaps we should open a private channel?

doleron commented 3 years ago

Great! I don't think github has a private chat. Do you think on discord or slack? My personal mail is doleron at gmail then we can chat on hangouts as well. Up to you.