zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.44k stars 6.4k forks source link

Convert subsys/mgmt to new kwork API #34101

Closed galak closed 3 years ago

galak commented 3 years ago

Convert subsys/mgmt/hawkbit/hawkbit.c & subsys/mgmt/updatehub/updatehub.c to use new kwork API.

See the following issue for details:

33104

de-nordic commented 3 years ago

@Navin-Sankar is more appropriate person to handle subsys/mgmt/hawkbit/hawkbit.c and @nandojve to handle subsys/mgmt/updatehub/updatehub.c.

nandojve commented 3 years ago

I can handle updatehub. CC @otavio

de-nordic commented 3 years ago

I can handle updatehub. CC @otavio

Thanks @nandojve !

nandojve commented 3 years ago

I can handle updatehub. CC @otavio

Thanks @nandojve !

Sorry for my long delay, too busy because of chip shortage.

galak commented 3 years ago

Sorry for my long delay, too busy because of chip shortage.

Chip shortages don't effect software ;).

de-nordic commented 3 years ago

Sorry for my long delay, too busy because of chip shortage.

I have just notified you so the delay is like 6 hours so far ;).

nandojve commented 3 years ago

Sorry for my long delay, too busy because of chip shortage.

I have just notified you so the delay is like 6 hours so far ;).

@de-nordic yes, on this, but I think there are others with more than 2 weeks.

nandojve commented 3 years ago

Sorry for my long delay, too busy because of chip shortage.

Chip shortages don't effect software ;).

It has been tough weeks these days and I can only help on my free time. At end LT is increasing hehe

nandojve commented 3 years ago

@de-nordic, @nvlsianpu

I don't understand what is happening. frdm-k64f, and nucleo_f767zi don't boot zephyr image. frdm-k64f I know that @henrikbrixandersen will fix code at Zephyr side soon, so I have a workaround at MCUboot for now. However, frdm-k64f and nrf52840dk_nrf52840 doesn't confirm image.

My last good Zephyr image is from early Dec/20. Since there, everything stop to work and now even confirm image don't work. BTW, nucleo_f767zi confirm image never worked properly.

The steps are so simple to test and see that board don't work:

# From the root of the zephyr repository
west build -b frdm_k64f -d build/mcuboot-frdm_k64f bootloader/mcuboot/boot/zephyr
west flash -d build/mcuboot-frdm_k64f

west build -b frdm_k64f -d build/app zephyr/samples/subsys/mgmt/updatehub
west sign -t imgtool -d build/app -- --version 1.0.0 --pad --key bootloader/mcuboot/root-rsa-2048.pem
west flash -d build/app --bin-file build/app/zephyr/zephyr.signed.bin

The below log is from nRF.

*** Booting Zephyr OS build zephyr-v2.5.0-2444-g210ff8c35050  ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Boot source: primary slot
I: Swap type: none
I: Bootloader chainload address offset: 0xc000
I: Jumping to the first image slot

uart:~$ *** Booting Zephyr OS build zephyr-v2.5.0-2444-g210ff8c35050  ***

uart:~$ updatehub info 
Unique device id: d7eed6b80dd18d94
Firmware Version: 1.0.0
Product uid: 6132590955b7d37f1c4e2bcaeafb194001a68f4f05c4158bae387c1abaff2841
UpdateHub Server: coap.updatehub.io
[00:01:33.014,831] <inf> main: UpdateHub sample app started
[00:01:33.014,862] <inf> main: Confirming the boot image
[00:01:33.014,923] <err> main: Error to confirm the image    <<<<<<<<<<<<<<<<<<<
[00:01:33.014,953] <inf> main: Network disconnected
uart:~$ 

UpdateHub start code freeze by 2.3 for LTS. We only made changes on the sample to allow network management for (ETH, WIFI, IEEE-802.15.4, BLE and OT). Right now, no one can use that and we lost, at least, 1 year of work if Zephyr and MCUboot won't work.

I rely hope you guys can stable MCUboot and Zephyr soon. Let me know when you have a good version so I can test.

CC: @otavio

otavio commented 3 years ago

What is more concerning is that this kind of regression has become a habit inside the Zephyr project. What actions are on course to avoid this from happening in the future? How can developers trust Zephyr for critical devices and upgrade to new Zephyr releases with such bad regression tracking?

Navin-Sankar commented 3 years ago

@nandojve I am also experiencing the same error. hawkbit image doesn't boot with frdm_k64f board.

*** Booting Zephyr OS build zephyr-v2.5.0-2443-g8e1cfe9a4684  ***                                   
I: Starting bootloader                                                                              
I: Primary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3                           
I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3                                 
I: Boot source: primary slot                                                                        
I: Swap type: none                                                                                  
I: Bootloader chainload address offset: 0x20000                                                     
I: Jumping to the first image slot

mcuboot doesn't handover the control to hawkbit image.

de-nordic commented 3 years ago

I will look into the problem on nrf52840dk.

de-nordic commented 3 years ago

@nandojve Can you give me sha of the Zephyr commit you are using?

As I understand, the image confirmation is done so that there would be no attempt to update device from unconfirmed image, right?

I have problem with reproducing that on nrf52840dk; I have build the updathub sample with ot overlays and programmed the image to the board. I get this:

uart:~$
*** Booting Zephyr OS build zephyr-v2.5.0-2444-g383700b6355b  ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x1
I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Swap type: none
I: Bootloader chainload address offset: 0xc000
I: Jumping to the first image slot
*** Booting Zephyr OS build zephyr-v2.5.0-2444-g383700b6355b  ***

[00:01:33.009,338] <inf> main: UpdateHub sample app started
[00:01:33.009,368] <inf> main: Confirming the boot image
uart:~$

Can you send me some more context? SHA of Zephyr commit you use as you base and what is the mcuboot commit sha you have there? If it is possible, please provide command line you use to build the sample.

nandojve commented 3 years ago

Hello, it was #5520 from yesterday. Probably we are at same point by v2.5.0-2444. I update everything and than try build one version to validate kwork API.

What are below differences? If we are supposed to use same build/flash commands why we got different values? What steps you use?

nandojve

I: Primary image: magic=good, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3

de-nordic

I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x1
I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
de-nordic commented 3 years ago

in Zephyr dir, clean master (383700b6355b0407b25e096d98164d26ab6a11a5), west update and so on, then

nrfjprog --eraseall
west build -b nrf52840dk_nrf52840 --build-dir voot/ bootloader/mcuboot/boot/zephyr/ -t menuconfig
west build -b nrf52840dk_nrf52840 --build-dir voot/ bootloader/mcuboot/boot/zephyr/
west flash --build-dir voot
west build -b nrf52840dk_nrf52840 --build-dir updatehub-ot zephyr/samples/subsys/mgmt/updatehub/ -t menuconfig  -- -DOVERLAY_CONFIG=overlay-ot.conf -DOVERLAY_CONFIG=overlay-prj.conf.example
 west build -b nrf52840dk_nrf52840 --build-dir updatehub-ot zephyr/samples/subsys/mgmt/updatehub/  -- -DOVERLAY_CONFIG=overlay-ot.conf -DOVERLAY_CONFIG=overlay-prj.conf.example
west flash --build-dir updatehub-ot

I do menuconfig to set the mcuboot key and increase log levels, then I got:

uart:~$ *** Booting Zephyr OS build zephyr-v2.5.0-2444-g383700b6355b  ***
I: Starting bootloader
I: Primary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Swap type: none
I: Bootloader chainload address offset: 0xc000
I: Jumping to the first image slot
*** Booting Zephyr OS build zephyr-v2.5.0-2444-g383700b6355b  ***

uart:~$ updatehub info
Unique device id: 7eddd20ad8b4af7a
Firmware Version: 0.0.0
Product uid: e4d37cfe6ec48a2d069cc0bbb8b078677e9a0d8df3a027c4d8ea131130c4265f
UpdateHub Server: 10.5.3.67
uart:~$
uart:~$ updatehub run
Starting UpdateHub run...
Invalid response
[00:01:10.664,855] <err> updatehub: The current image is not confirmed
uart:~$
uart:~$
[00:01:33.009,368] <inf> main: UpdateHub sample app started
[00:01:33.009,399] <inf> main: Confirming the boot image
uart:~$

But this is just from last run. The log I have gave you previously has been received after I have already flashed several updatehub images and even smp_svr (with the same mcuboot) to check if the boot_write_img_confirmed or boot_set_confirmed breaks.

galak commented 3 years ago

@nandojve any updates on this?

nandojve commented 3 years ago

Hi @galak , The conversion was made and seems OK. I think we should go with #34442.

The MCUboot issue probably should be addressed on another thread. Otherwise this will block the whole API conversion and there is no reason to do that.

Is LTS scheduled for 2.7 right?

nandojve commented 3 years ago

Just for record.

Now there is #34530. The big question is: when MCUboot/Zephyr API/infrastructure will be stable? What happened with 3 versions API before deprecate? There will be multiple ways for LTS?

This difficult what is best time to explore and fix problems.

BTW I didn't look yet #34530 but my felling is that may brake device compatibility. User will be stuck with their current version and can't upgrade to a newer version, only fixups. Is that true?

Note: I understand project wants what is best for LTS but I think we can not do everything. If we do everything, there won't be a 3.0. My feeling is this "madness" should stop. This create a sense of insecurity and this project never will have chance to be reliable.

de-nordic commented 3 years ago

@nandojve

BTW I didn't look yet #34530 but my felling is that may brake device compatibility. User will be stuck with their current version and can't upgrade to a newer version, only fixups. Is that true?

The change is supposed to maintain the compatibility with current API, generally you should be able to get flash area by ID, determined from DTS by label (FLASH_AREA_ID()) and use flash_area_open or directly get flash_area object (FLASH_AREA()) and either pass it to flash_area_open or use directly (I would stick with flash_area_open for now as I guess it may, for example check, if the device is ready). The internal implementation changes and will be a little bit faster as there will no longer be device_get_binding called at runtime and the code size is slightly reduced.

@nandojve Please voice your concerns as comments to #34530.