Closed Electron752 closed 7 years ago
We have a rpi-4.11.y tree that is largely working. How much 64-bit support is present in this tree? We're generally happy to accept backports from upstream as long as someone finds them useful and they don't cause problems.
Mathematica is specifically licensed for raspberry pi, so I wouldn't be surprised if there are some sanity checks on the processor type which arm64 may unintentionally break.
rpi-4.11.y doesn't quite have enough to run the hello_pi examples or minecraft_pi, etc because It's lacking the vchiq compat stuff. That was all added as a single change it upstream, which I believe is still in linux-next. I'm not sure you can pull a commit from linux-next though.
I do know what it takes it get Mathematica to load from arm64. Upstream is missing a few small changes to report the correct processor type when running in 32 bit mode. I actually have them, but I doubt they would even be considered in upstream. Would you consider them for downstream only?
Oh, and the camera changes for 64 bit where added as a series of commits that I think is still in linux-next. But I think apps such as Raspicam should still work without it.
Other then that, I think arm64 should be 100% complete.
I don't have any objections to any of those changes.
What is the status of compatibility layers for:
CONFIG_BCM_VC_CMA=y
CONFIG_BCM_VC_SM=y
These are not needed for 64-bit userspace, where there is no intention of porting Raspbian or Userland, but they would be useful for running 32-bit Raspbian with a 64-bit kernel.
At the moment those don't work. I was planning to just ignore BCM_VC_CMA because my understanding is that it never worked well.
The other driver could probably get fixed rather easily.
Agreed - BCM_VC_CMA is something unsupported by us and is no longer enabled in rpi-4.11.y tree as it is not recommended.
CONFIG_BCM_VC_SM is useful.
CONFIG_BCM_VC_SM is useful.
Soon to be required once I get to dmabufs for V4L2 and MMAL (probably a week or so away). That is the service I intend to use to support import a dmabuf and convert into a mem_handle for use by the GPU, or export a gpu_mem allocation as a dmabuf. For the V4L2 case of wrapping a contiguous buffer I could do it via a new MMAL call instead, but not for a more general import case, nor exporting.
These are not needed for 64-bit userspace, where there is no intention of porting Raspbian or Userland, but they would be useful for running 32-bit Raspbian with a 64-bit kernel.
I'm not sure I follow the logic there in the case of VC_SM. It has no relevance whether things are 32 or 64 bit, but total relevance as to how GPU memory is handled. If you want to map gpu memory directly into the ARM world, then VC_SM is the tool to do that.
I'm still a bit confused about what is going one here and what not. Was this all just to prove me wrong or something? If this is really something new from a certain west coast US company, I never had any beef with them. I just never considered their stuff before because they always required big noisy fans and what not.
Is my RPI 3 aarch64 kernel running on bare metal, or is it in just some kind of weird user mode simulation/sandbox?
I mean I hope I helped some with this whole thing and such. I can probably get VC_SM working rather easily, but I'm not really sure if help is wanted or even needed at this point.
Not quite sure what you are getting at, but I can assure you nothing said here is to prove you wrong. What 6x9 has stated is how parts of the v4l2 and MMAL subsystems are going to be modified in the very near future by the RPF/T to make them better (more speed, less memory requirement, less SDRAM bandwidth). This means we need shared memory handling between Linux (32 or 64bit) and the GPU. This shared memory means less buffer copying, and this will be enabled by the switch quoted.
No conspiracy here, just a development path.
On 27 March 2017 at 17:40, Electron752 notifications@github.com wrote:
I'm still a bit confused about what is going one here and what not. Was this all just to prove me wrong or something? If this is really something new from a certain west coast US company, I never had any beef with them. I just never considered their stuff before because they always required big noisy fans and what not.
Is my RPI 3 aarch64 kernel running on bare metal, or is it in just some kind of weird user mode simulation/sandbox?
I mean I hope I helped some with this whole thing and such. I can probably get VC_SM working rather easily, but I'm not really sure if help is wanted or even needed at this point.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_raspberrypi_linux_issues_1915-23issuecomment-2D289511037&d=DwMFaQ&c=DpyQ_ftY536pf7wCBQXXU58xADDRY77THQzJu1OmzOo&r=w09_2ePv8G3zRjoV19Wm1Q6rI7CDlOns4PuRv2hHkek&m=qZ9wEcd2mEQhkNviOHIWhdxrC-N89q5purlwm0MtgHM&s=zSwfLyZLuh17xE4G8aNGBjBGsdIbAqFM5azcI2Uq6t0&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADqrHSyuR736q6MDBQpiPii-2D4vR4Jquoks5rp-2DaWgaJpZM4MoLzb&d=DwMFaQ&c=DpyQ_ftY536pf7wCBQXXU58xADDRY77THQzJu1OmzOo&r=w09_2ePv8G3zRjoV19Wm1Q6rI7CDlOns4PuRv2hHkek&m=qZ9wEcd2mEQhkNviOHIWhdxrC-N89q5purlwm0MtgHM&s=PSSNm3LEPHKX3dVSUtRAE1-u-X-qPhtyz4vlrhB0Xfo&e= .
-- James Hughes Principal Software Engineer, Raspberry Pi (Trading) Ltd
I know what I need to do, but it's going to take awhile for me to make the changes correctly. I'm having a bit of difficulty getting things to work, but I'll keep trying .
If you want to map gpu memory directly into the ARM world, then VC_SM is the tool to do that.
but if you want to map ARM memory onto the GPU you would use cma. What benefits do you get from using VC_SM instead of an interface that is already used upstream ?
Sorry, I need to be more careful about using GPU vs VPU.
CMA will allocate a contiguous block of memory that the kernel knows how to handle, but is foreign to the VPU. In that form it is therefore totally useless for achieving a zero copy solution for any camera (including via V4L2) or video codecs use case. Something needs to be used to take the details of a contiguous block of memory and pass it to the VPU in a suitable format. My intention is to use VC_SM as that already exists to handle VPU MEM_HANDLES, convert them to/from kernel representations, and clean up on the app terminating abnormally. Mailboxes don't have any notion of "connection", so we can't handle the cleanup with that API, and it seems daft creating a whole new VCHI service when we already have a service for sharing memory.
Please correct me if I'm wrong, but I also see a missing chunk in upstream - userspace is unable to allocate from CMA, so userspace can't pass larger blocks of memory without having a specific driver to handle it. Android's Ion is being cleaned up to move out of staging, but that's based around heaps rather than just CMA. I don't believe there are any other APIs around attempting to offer similar allocation services of contiguous blocks of memory. I was struggling to find anything much that handled CMA and exporting as dmabufs so that things really can be passed around.
More complete details of my current plans on #1806 and #1807.
There is further discussion to be had over how things migrate from having gpu_mem needing to be 64 or 128M into using CMA and vc4-kms-v3d. Even with the above VC_SM changes that is only covers the data exchanged with the ARM. Both camera and codecs need internal working buffers to be allocated from somewhere. Potentially a reverse memory request from the VPU to kernel to allocate/free a block. How would that integrate into dmabuf, and does that make sense?
Thank you for explaining, the links back to previous issues were very helpful. I see that you have already considered:
VCSM could be rejigged to use CMA by default as the allocator and map it into VC, as a new implementation of vc_sm_alloc().
So I accept your conclusion that CMA alone is not good enough.
There once was a TV show called, "Twin Peaks". This whole RPI thing is sort of becoming like that. Everything starts out simple, but as things go the more complex and weird it gets.
So, I'm still trying to understand the boot process here and exactly what the physical limitations are that people are trying to work around.
Here is what I understand so far: The VPU has a bootrom and bootram and that's the initial point of execution for the VPU. This bootrom has enough stuff to get bootcode.bin somehow loaded from the SD card(or USB or Network) into the cache. This activates the external RAM(1 GB in the RPI 3) which is on a different chip. The VPU running from the cache, loads the various .elf and .dat files into the external RAM and jumps into that. These .elf files start up the ARMs which load the kernel or uboot or whatever.
OK, now is what I think is a huge part of the puzzle here. The bootrom and bootram, appears to need almost as much capability as bootcode.bin(it needs to parse the filesystem, use USB, load files, etc). Yet it's only 32KB instead of the normal 64KB or bootcode.bin. So how can it be half the size?
I really only see two really possibilities here:
bootcode.bin does much more than bootrom. It parses config.txt (start_file=start_x.elf is a bootcode.bin parameter) It loads fixup.dat and handles the memory split (using gpu_mem=128 setting) It supports loading from multiple partitions (for NOOBS) It handles LEDs (including I2C gpio expander) for LED error patterns like missing start.elf It initialises SDRAM. It sets up PLLs. bootrom runs from OSC so core runs at 19.MHz. bootcode.bin ups this to 250MHz so is faster to load the much larger start.elf.
Even then, I'll still amazed at how much is packed into that 32KB. It needs to access the network for example.
I see this register. Is it actually used: INTERRUPT_HOSTPORT
It's almost as if someone took the original chip design files and is using them to run a simulation. You know the VHDL or RTL or whatever the silicon is designed with.
The hostport (sometimes called MPHI) was an interface (address/data bus) to allow VideoCore to be used a co-processor which wasn't used on Pi1 and was removed on Pi2/Pi3.
The DWC_OTG driver uses this spare interrupt to trigger ISR context processing from the FIQ.
So, yes, the hostport interrupt does trigger when using DWC_OTG driver, but not for its original purpose.
Is the bcm2708 a real chip, or is it just a test harness of some kind.
I'm looking at the broadcom graphics stack that's publically released, at it appears to be written from the perspective of a test harness. It also isn't clear to me what they refer to as VC and what is the host. For example, is Linux VC or the HOST on the RPI 3?
For example, I'm really on my way to Oz here. But one in theory could flip the roles of the camera and the display. A display that's really recording. And a camera that's really displaying.
I guess what I'm getting does bootrom have enough stuff in it to be freestanding, or does it rely on the facilities of something else and it's just a hook into services. Such as how VCHIQ or mailbox provides video or audio.
I imagine VC is VideoCore (i.e. the GPU).
VideoCore was originally designed as a co-processor for multimedia acceleration of a more powerful host processor (typically an arm). The Hostport/MPHI was the interface to an external arm, but 2835 included a very small arm that could act as the host for low performance use cases.
2708 is the core. 2835 is a specific chip that uses the 2708 core. Similarly: 2709 is the core. 2836 is a specific chip that uses the 2709 core. Similarly: 2710 is the core. 2837 is a specific chip that uses the 2710 core.
As 2835 is the only widely used implementation of 2708, they names tend to be used interchangeably.
The bootrom is merely a rom. You give it an address and it returns (fixed) data.
VideoCore directly executes code from the bootrom. The code just checks for presence of NAND flash, SDCARD, USB, etc and tries to fetch more code from there to continue with next stage of booting.
OK, I'm understanding this more now.
Another interesting point. I look at the graphics stack and it including audio, but it looks very similar to the ALSA driver for sound on the RPI 3. It's kind of written as something that's providing the sound services as opposed to something that's using a services.
I mean, you can simulate playing audio as something that's actually recording audio.
Does the RPI 3 have a NAND or something else on the board or chip somewhere that is supplying the code for the network functions to get bootcode.bin loaded. Like I said, that's way too much functionality for 32KB.
OK, so bootrom is just a set of calls that are being called by something else. Essentually hooks so to speak. It's not actualy the reset point.
So where is the actual start of things?
My Commodore 64 had an OS and a BASIC interpreter in 20KB. You youngun's could learn a thing or two from the old school coders.
The bootrom is the first code run by the VPU, which is the first and only processor to run when the chip reset is released. Its responsibility is to load bootcode.bin - from SD, USB or network - into the cache and execute it. End of.
I'm thinking now that at POR/Reset those ARMs are indeed actually running and doing something. They are what is providing the services or calling into bootrom which is essentually a callback system.
It just so happens that on the RPI 3, the ARMs are being reset again later in the boot sequence.
So where is the code at that actually does the initial setup of the system?
See previous answer.
I actually had a Commodore VIC 20 which is older then the 64. So I one up you there.
Is the actual boot sequence in an some kind of byte code then.
What exactly is providing the interpreter?
You are still thinking backwards as far the chip goes. The ARM is really a peripheral of the GPU. The ARM only gets power and clocks applied at the very end of start.elf running.
2835 has actually been used in products with the arm disabled (through an OTP bit). The arm is not an essential part of 2835. It just happens in the raspberry pi use case we do rely on it.
No, it's just C compiled into VPU assembly instructions. The code density may be higher than an ARM, but there is no magic - just no bloat.
Like I said, I know about the VIC 20 and I think no way no how does that bootrom has enough stuff to load off the network.
Popcornmixes point about you give it and address and it returns stuff is interesting.
You know that LAN chip on the RPI 3 has an EEPROM in it, right? Or it is at least externally connected. It wouldn't by any chance being loading up another stage of the boot sequence from their would it?
You are starting to sound delusional. Everything popcornmix and I have told you is true, and that's the last I will say on the matter.
Exactly, especially this point.
The bootrom is merely a rom. You give it an address and it returns (fixed) data.
You know what, assuming that bootrom really is fixed which it probably is. In that it's OTP or actual hardcoded ROM.
A very easy way this stuff could have been implemented was to simply have that LAN chip from Microchip actually do the init boot process itself. The VideoCore could simply boot in USB device mode(which is what I understand is the default anyway), and the USB Lan chip could act as a host. It could load bootrom.bin from wherever and squirt it into the VC during initial power up.
You know those PIC controlers can actually do alot you know. I think they even had a public open source network stack for it at one point.
Sounds like they alternative boot modes where add at the last minute in the RPI 3 development, and if I were to do something like that that's how I would do it certainly.
There is NO POINT is considering how things SHOULD have been implemented. Things are the way they are, there are reasons why they are the way they are (outlined above), and they are not likely to change in the future - because it WORKS.
Please be aware that the people replying on this thread are THE experts in this field.
Also, this sort of discussion is more suited to the forums.
On 28 March 2017 at 16:31, Electron752 notifications@github.com wrote:
You know what, assuming that bootrom really is fixed which it probably is. In that it's OTP or actual hardcoded ROM.
A very easy way this stuff could have been implemented was to simply have that LAN chip from Microchip actually do the init boot process itself. The VideoCore could simply boot in USB device mode(which is what I understand is the default anyway), and the USB Lan chip could act as a host. It could load bootrom.bin from wherever and squirt it into the VC during initial power up.
You know those PIC controlers can actually do alot you know. I think they even had a public open source network stack for it at one point.
Sounds like they alternative boot modes where add at the last minute in the RPI 3 development, and if I were to do something like that that's how I would do it certainly.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_raspberrypi_linux_issues_1915-23issuecomment-2D289808537&d=DwMFaQ&c=DpyQ_ftY536pf7wCBQXXU58xADDRY77THQzJu1OmzOo&r=w09_2ePv8G3zRjoV19Wm1Q6rI7CDlOns4PuRv2hHkek&m=2VGs-eVY_Gu29FvrjQO5UyA5AZkJ9JYkxxkLqN7CiNc&s=Z8nt45nG5JQAQQLrUL99T_eWaGuuhnUCmnteovCLg6k&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADqrHSyPMSDfE318NhKVg3PCzCbQwnORks5rqSfAgaJpZM4MoLzb&d=DwMFaQ&c=DpyQ_ftY536pf7wCBQXXU58xADDRY77THQzJu1OmzOo&r=w09_2ePv8G3zRjoV19Wm1Q6rI7CDlOns4PuRv2hHkek&m=2VGs-eVY_Gu29FvrjQO5UyA5AZkJ9JYkxxkLqN7CiNc&s=OABe7S_r8vAVNocKOvXA1vnFbgENsMtJKPfOSwfso3A&e= .
-- James Hughes Principal Software Engineer, Raspberry Pi (Trading) Ltd
@pelwell:
No, it's just C compiled into VPU assembly instructions. The code density may be higher than an ARM, but there is no magic - just no bloat.
just for my understanding and curiosity … is the source code available for all of this?
You must be new around here - no.
yes. new to this. thanks for the explanations in this thread. the source code would have eased the doubts of @Electron752 … maybe ;-)
Closing this issue as questions answered/issue resolved.
So it appears that all the needed bits for ARM64 have been merged now in upstream Linux. This was my primary goal here and it now appears to be approaching or has reached work complete.
Does anybody have any suggestion on where this could all go next?
I do know that Mathematica still doesn't work on Raspbian due to various issues.