Thanks for making this repo! I have but a few questions

stupid-2020 / multi-booting-xavier

Multi-booting for the Jetson AGX Xavier with NVMe SSD

MIT License

6 stars 0 forks source link

Thanks for making this repo! I have but a few questions #2

Closed unphased closed 2 years ago

unphased commented 2 years ago

It's been a pretty steep learning process so far to dig into customizing the boot behavior of Jetson modules, because of how much knowledge Nvidia's documentation assumes the reader already knows.

Your repo here was actually the one that finally succeeded in prompting me to learn about a large number of relevant topics, so I wanted to personally thank you for bringing together enough of these little clues to finally clue me in.

I learned that the boot process can only be accessed via UART, so if I understand this correctly I will need to have the device set up with a serial port just to be able to interactively do anything with the boot selection process if applicable. Your hyperlink related to this explains a setup that is far more complex than what I imagined I would need for that, but it seems quite doable. This being a development kit, it makes so much sense to me, but not having it spelled out seemingly anywhere in documentation, and being a lifelong PC user where the interface is always available on the monitor meant that I truly don't know how much longer I would have been stuck here for.
After looking at your scripting and thinking about it, and repeating that a few times, I learned that it is possible to modify the code included in the initial RAM disk which I vaguely knew about but today understand to be a zipped archive of a filesystem (the initial ram disk filesystem). What you do in the script is unpack this and run the relevant patch on the init script, which I could only assume is that hallowed PID=1 init process which we know and love. Incidentally I couldn't find the init process on the real rootfs. It certainly makes sense if init actually begins its life from initrd and can only be found there! It's the forerunner, the one that pierces the veil, so to speak, or, alternatively, enters the wormhole, so to speak, to continue on to give life to its descendants!

I am looking at my goal through a Jetpack 4.6/L4T 32.6.1 lens and I've already confirmed that the patch changes are unnecessary for me. What I was going to ask originally was what the purpose of this patch script was, but now I understand that it was to tweak the init process so it may be allowed to boot from nvme. Since you taught me how the initrd image works, I was able to confirm that the latest software indeed has these changes incorporated already, so I will not need to use any of this scripting and can hopefully get things working by modifying the extlinux.conf file directly.

When I first started writing this, I thought that I would have a list of questions. However your code was so readable that all that is left to do is for somebody to reaffirm with me that I have understood everything correctly. Thank you.

stupid-2020 commented 2 years ago

Thank you very much for your questions.

According to my understanding and from the post, it is the only way to select the desired "OS" from the boot menu. Once you connect the serial console, you will find there is 4 ports available. If you DO NOT have other serial port enabled, most likely you can get the console output at COM4 (on Windows) or /dev/ttyUSB3 (on Linux).
Start from JetPack 4.5, NVIDIA (or someone else) has fixed the issue and you can specify /dev/nvme0n1pX as root. (Sorry that I only specify that in source code. JetPack 4.4.1 or before (without patch) has a small problem in regex pattern for rootdev value when using /dev/nvme0n1pX. Without patching, the common way is passing PARTUUID instead of /dev/nvme0n1pX. However this method is not trivial and it is easy to make mistake.

unphased commented 2 years ago

Thanks for the clarification! This helps. I am currently still looking around a bit, hoping to make sure I get my ducks in a row properly for touching the boot configuration in extlinux.conf. Though it's not too big a problem to mess up, I'd just need to set up serial console or reflash again.

What I'm working on is coming up with a streamlined scheme to prepare a large number of modules for use in an early production deployment. We do need to use NVMe mainly for the storage space, but the improved performance is also quite important. As you know getting this set up for Jetpack 4.3/4.4 requires some complexity to set it up for rootfs on nvme and the process requires booting first to something onboard. Since this changes with the latest 4.6, and we can boot straight to NVMe, I'm hoping to streamline our provisioning to two steps:

Flash QSPI (Is this EEPROM? I understand this very little) on the module, somehow. This should not be a slow process. This clearly requires some incantation of flash.sh. I just do not know what.
Get L4T Linux loaded into NVMe and can prepare/image the NVMe independently of any of Nvidia's SDK flashing tools. I'm having lots of trouble making the initrd method of flashing work to flash to NVMe. Plus, why I could choose to use a USB connection to place data onto an NVMe SSD is beyond me, I don't want this.

Note that I would prefer to leave the eMMC on the module completely untouched, to save time. It is true that I could flash it so it can work as a backup copy of the operating system to boot from. Indeed it should be possible to flip a switch to cause the device to booted to eMMC instead, into a sort of "Recovery" environment, and then scripts could then be used to nuke/reset the partition as needed on the NVMe. So I suppose this means that I will get there eventually. In my testing I've already achieved flashing the OS to eMMC with the SDK Manager, and it works great. Indeed it is required to do this to get an eMMC module up and running in any capacity anyway.

The big gap in knowledge I have at this point is what all is held inside the onboard QSPI nonvolatile memory. I am led to believe that it contains boot configuration, some form of it, perhaps cboot lives there.

I think my next step to confirm my hypothesis is to follow https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/bootflow_jetson_xavier.html#wwpID0E0JB0HA According to this, the NVMe is being read first, and the boot behavior is being determined by it. This means I should be able to alter boot behavior by editing /boot/extlinux/extlinux.conf on the NVMe and that if I bork it I could continue to make changes to it from another computer.

unphased commented 2 years ago

Quick update: It frickin' worked, I am beyond pumped.

stupid-2020 commented 2 years ago

Flash QSPI (Is this EEPROM? I understand this very little) on the module, somehow. This should not be a slow process. This clearly requires some incantation of flash.sh. I just do not know what.

According to my understanding from the page 5 of the document, only JAXi (Jetson AGX Xavier Industrial) has QSPI NOR (NOR flash) and has 64MB only.

Get L4T Linux loaded into NVMe and can prepare/image the NVMe independently of any of Nvidia's SDK flashing tools. I'm having lots of trouble making the initrd method of flashing work to flash to NVMe. Plus, why I could choose to use a USB connection to place data onto an NVMe SSD is beyond me, I don't want this.

Once you have one NVMe SSD done. You can simply clone the NVMe SSD to another.

Note that I would prefer to leave the eMMC on the module completely untouched, to save time. It is true that I could flash it so it can work as a backup copy of the operating system to boot from. Indeed it should be possible to flip a switch to cause the device to booted to eMMC instead, into a sort of "Recovery" environment, and then scripts could then be used to nuke/reset the partition as needed on the NVMe. So I suppose this means that I will get there eventually. In my testing I've already achieved flashing the OS to eMMC with the SDK Manager, and it works great. Indeed it is required to do this to get an eMMC module up and running in any capacity anyway.

According to the link you provided, the new CBoot scan External NVMe device in boot sequence. It should be possible to boot directly from NVMe.

unphased commented 2 years ago

Thank you!! Your first link "page 5 of the document" gives me a

File Not Found or Link has Expired Oops, sorry for the inconvenience It seems that the file you have tried to download is no longer available or the URL used is no longer valid. Please refer back to the product page and follow the links to get the latest downloadable version.

page.

But, it does give me hope that maybe I can boot nvme with a fresh-from-factory SoM. Just perfect since I've got my last fresh one sitting here and I will test this out!

unphased commented 2 years ago

I should also note, I found in the file Linux_for_Tegra/tools/kernel_flash/README_initrd_flash.txt,

Example 2: In this example, you want to boot Jetson Xavier NX SD from anattached NVMe SSD. The SD card does not need to be plugged in. You can also apply this if you don't want to use the emmc on the Jetson Xavier NX emmc.

First step: Put the device into recovery mode, then generate qspi only images for the internal device: $ sudo ./tools/kernel_flash/l4t_initrd_flash.sh --no-flash jetson-xavier-nx-devkit-qspi internal

Note: The board name given here is not jetson-xavier-nx-devkit or jetson-xavier-nx-devkit-emmc so that no SD card or eMMC images are generated.

Second step: Put the device into recovery mode, then generate a normal filesystem for the external device: $ sudo ./tools/kernel_flash/l4t_initrd_flash.sh --no-flash --external-device nvme0n1p1 -S 8GiB -c ./tools/kernel_flash/flash_l4t_nvme.xml --external-only --append jetson-xavier-nx-devkit external

Third step: Put the device into recovery mode, then flash both images: $ sudo ./tools/kernel_flash/l4t_initrd_flash.sh --flash-only

That does make me think contrary to what you just stated.

Also: https://forums.developer.nvidia.com/t/is-it-possible-flash-only-qspi-nor/177096/4

unphased commented 2 years ago

I can confirm now that placing my fresh SoM into the board where I had boot from nvme set up results in nothing happening. The USB ID reports it being in recovery mode (ready to be flashed).

❯ lsusb | grep NVID                            
Bus 003 Device 039: ID 0955:7e19 NVIDIA Corp.

unphased commented 2 years ago

Pretty sure that Linux_for_Tegra/jetson-xavier-nx-devkit-qspi.conf is the one i'd use, so it'd be

$ sudo ./flash.sh jetson-nx-devkit-qspi mmcblk0p1

that should hopefully be my holy grail flash command to initialize a SoM enough to let NVMe take it the rest of the way.

stupid-2020 commented 2 years ago

Pretty sure that Linux_for_Tegra/jetson-xavier-nx-devkit-qspi.conf is the one i'd use, so it'd be
$ sudo ./flash.sh jetson-nx-devkit-qspi mmcblk0p1
that should hopefully be my holy grail flash command to initialize a SoM enough to let NVMe take it the rest of the way.

I guest it does not work in that way. Even you change the board name to jetson-xavier-nx-devkit-qspi. Anyway, you can find the supported board value from point 6 of To flash Jetson....

I haven't tried JetPack 4.6 but the concept is the same. Command flash.sh generally flash both MB2, the filesystem (Does not include CUDA or other toolkits, please correct me if I am wrong). Command l4t_initrd_flash.sh works differently. I guess First step mentioned below only flash TegraBoot or MB2 (Not the kernel or filesystem mentioned in Second Step).

First step: Put the device into recovery mode, then generate qspi only images for the internal device:
$ sudo ./tools/kernel_flash/l4t_initrd_flash.sh --no-flash jetson-xavier-nx-devkit-qspi internal

But if you want to put everything (including filesystem) to QSPI on Jetson Xavier NX, it is not possible as the QSPI NOR has 32MB only (refer to page 3).

unphased commented 2 years ago

from point 6 of To flash Jetson....

This link is to 32.4.2 so I think stuff has changed. I'm referencing 32.6.1

Yeah I was kind of making slow progress for some time because I got discouraged by the lack of information around alternative use cases. I know that all the documentation and information that I could find in the beginning was tailored to delivering the 5GB of initial content and indeed the eMMC flashing process also streamlines the installation (via USB or Ethernet after flash) of SDK components, which basically fills up another 10GB!

This is what I'm trying to avoid since i'm preparing steps for automation and I want to use my own hardware, not nvidia's USB2 micro cable, to replicate the storage!

I'll test it soon, I hope that it works, that way I can run flash.sh in such a way as to flash just the qspi so that it gains CBoot, from which point I might be free to drop in a NVMe and already be off to the races.

In the actual final deployment later on I reckon I'll also deploy a tailored OS into the eMMC as well, ready to jump in during recovery situations. In such a scenario then the SoM flashing step takes 10 minutes again, which we'll need to account for.

I do expect that CBoot fits on the 32MB QSPI, and although I am curious what else is on there, I am not too concerned about it.

stupid-2020 commented 2 years ago

This is what I'm trying to avoid since i'm preparing steps for automation and I want to use my own hardware, not nvidia's USB2 micro cable, to replicate the storage!

Refer to NVIDIA Jetson Linux Driver Software Features:

Flashing to Multiple Jetson Devices

NVIDIA provides a tool and instructions for flashing Jetson devices efficiently in a factory environment. This tool is part of the Linux BSP package and is available in the Linux_for_Tegra folder. Instructions for using the tool are included in README_Massflash.txt, located in the same folder.

Maybe you can get some insight from the post on NVIDIA Developer Forum. BTW, I also suggest you to ask the question there as the moderator and other expert could help.

unphased commented 2 years ago

yep! definitely. Thanks so much for your help so far!

Yeah I think I'll need to learn about that once we get into mass setup for different locked down encrypted devices. for now we're ok with establishing a quick and dirty unsecured device workflow.