rauc / meta-rauc

Yocto/Open Embedded meta layer for RAUC, the embedded Linux update framework
MIT License
160 stars 90 forks source link

Rootfs slot checksum changes even when no changes have been made #269

Open tcikel opened 1 year ago

tcikel commented 1 year ago

Hello,

I have a A/B configuration with 2 rootfs partitions and 2 appfs partitions. I am using adaptive update and I have both partitions as read-only so I can skip calculating hashes if the hash of slot matches of hash of bundle. The issue is that my rootfs slot has always different checksum hash even though no changes have been made, and the only change which was made was into other slot.

I am generating my partitions with help of wks file

part / --source rootfs --fixed-size 1024M --ondisk mmcblk0 --exclude-path data/ --exclude-path app/ --fstype=ext4 --label rootfs_A --align 4096
part / --source rootfs --fixed-size 1024M --ondisk mmcblk0  --exclude-path data/ --exclude-path app/ --fstype=ext4 --label rootfs_B --align 4096
part /rce --source rootfs --rootfs-dir=${IMAGE_ROOTFS}/app  --fixed-size 512M --ondisk mmcblk0 --fstype=ext4 --label rce_A --align 4096
part /rce --source rootfs --rootfs-dir=${IMAGE_ROOTFS}/app  --fixed-size 512M  --ondisk mmcblk0 --fstype=ext4 --label rce_B --align 4096
part /data --source rootfs --rootfs-dir=${IMAGE_ROOTFS}/data --fixed-size 250M --ondisk mmcblk0 --fstype=ext4 --label data --align 4096

And creating bundle with copying partitions created by wic file directly into recipe to create bundle:

RAUC_SLOT_appfs = "fpc-image-base"
RAUC_SLOT_appfs[type] = "image"
RAUC_SLOT_appfs[file] = "app.ext4"
RAUC_SLOT_appfs[fstype] = "ext4"
RAUC_SLOT_appfs[adaptive] = "block-hash-index"

RAUC_SLOT_rootfs = "fpc-image-base"
RAUC_SLOT_rootfs[type] = "image"
RAUC_SLOT_rootfs[file] = "rootfs.ext4"
RAUC_SLOT_rootfs[fstype] = "ext4"
RAUC_SLOT_rootfs[adaptive] = "block-hash-index"

The issue might not even be in meta-rauc, but with my way of generating partitions and creating rootfs. But I dont understand why when i make change to files which are only in app partition the checksum of rootfs changes, I checked the contents of rootfs and the files from app partition are not there. It seems like every time rootfs is generated something changes which causes the checksum to also change. Any help would be appreciated.

ejoerns commented 1 year ago

@tcikel As far as I know, the default mkfs file system generation in yocto is not fully reproducible. E.g. directory hash seed might be different. You can proivde a fixed one with -E hash_seed <uuid>.

You should also align the file system's inodes to 4K bytes since block-hash-index uses 4k blocks.

-i 4096 -b 4096 -E hash_seed=d0e1ce7b-50ae-40dc-8f92-c6be220b00dc
tcikel commented 1 year ago

@ejoerns Thank you for quick response, I used your fix with hash_seed (not sure why this option is missing from man page of mkfs) which helped but the issue still persists. I have the files in the same order though now although still getting the different checksum. I compared contents rootfs partitions with 7z and got this:

diff rootfs0.txt rootfs1.txt 
16,18c16,18
< Modified = 2023-04-24 12:16:49
< Created = 2023-04-24 12:12:14
< Last Check Time = 2023-04-24 12:16:49
> Modified = 2023-04-24 12:52:30
> Created = 2023-04-24 12:52:27
> Last Check Time = 2023-04-24 12:52:30
24c24
< ID = 09964597DF374FC88F34D444CA2EC30F

> ID = 765487728230435682D33932C7E1C1F0
32c32
< 2023-04-24 12:12:14 D....                            lost+found

> 2023-04-24 12:52:27 D....                            lost+found
23644,23645c23644,23645
< 2018-03-09 14:34:56 .....          144         4096  var/cache/fontconfig/5c156208-e194-4827-9f14-402658bab0d0-le64.cache-7
< 2018-03-09 14:34:56 .....         9192        12288  var/cache/fontconfig/6e062ab2-355a-4115-9422-9850a2b2ec72-le64.cache-7

> 2018-03-09 14:34:56 .....          144         4096  var/cache/fontconfig/28b1dbf1-6536-4db9-8601-80167399499f-le64.cache-7
> 2018-03-09 14:34:56 .....         9192        12288  var/cache/fontconfig/8685fcd1-6295-49b3-9395-292e5545fda5-le64.cache-7
23662c23662
< 2023-04-24 12:12:14 .....     33554432     33554432  [SYS]/Journal

> 2023-04-24 12:52:27 .....     33554432     33554432  [SYS]/Journal
23664c23664
< 2023-04-24 12:12:14          872091597    918478848  21854 files, 1777 folders

> 2023-04-24 12:52:27          872091597    918478848  21854 files, 1777 folders

The fontconfig package is causing some issues, for some reason its regenerating the contents of cache. Not sure how important this package is so I will try to remove it. The bigger issue for me is that during wic build the lost+found file is created, I couldnt find anyway to stop wic form calling fsck which creates it (even though it might be bad idea to not call). Also how does RAUC creates the checksum from ext4, does it include the metada like Modified and Created which are seen above ? Because those will be different everytime I build partition unless the system time is somehow modified to be always the same

ejoerns commented 1 year ago

Normally, the reproducible build features of Yocto should pre-determine the timestamps. This seems to work well for rootfs content (2018 timestamp) but not for wic tooling. Which version of Yocto is this?

The hash btw. is just calculated over the ext4 image (file). So just the same as you would get when manually calling sha256sum on it.

tcikel commented 1 year ago

The version is Kirkstone, so basically if wic creates the partition it breaks the reproducibility. This is unlucky, I was thinking that workaround could be to create the appfs partition as different image so the rootfs would not be build again, but this would mean we would have to manually track the compatibility of versions.

tcikel commented 1 year ago

After some more digging I found this commit in openembedded repository. There seems to be some work on getting the wic to honor the reproducible thinking but so far only for fstab updated file. But it seems like good starting point.

ejoerns commented 1 year ago

The version is Kirkstone, so basically if wic creates the partition it breaks the reproducibility. This is unlucky, I was thinking that workaround could be to create the appfs partition as different image so the rootfs would not be build again, but this would mean we would have to manually track the compatibility of versions.

It is not quite straight-forward (or not intended) in yocto to build more file systems than the rootfs. So the normal approach is indeed to split this after creating. However, I wonder how you get access to the wic-generated partitions? The last time I had a look into this it did not work without any hacks since wic just uses them internally and removes them after having generated the disk image.

After some more digging I found this commit in openembedded repository. There seems to be some work on getting the wic to honor the reproducible thinking but so far only for fstab updated file. But it seems like good starting point.

I guess it would be interesting to see how poky master behaves here. There is an ongoing effort to make things more and more reproducible afaik. So maybe the patch you found is just one step further.

tcikel commented 1 year ago

The partitions stays after wic creates the main disk image, at least for me, I copy them after build from build-wic folder. I did some more work, and managed to modify wic plugin to edit superblock data with help of debugfs, and the superblock of partition is always same now, unfortanetly the atime and ctime and crtime of inodes is always taken from current time of build ctime: 0x6448d31e:00000000 -- Wed Apr 26 09:30:38 2023 atime: 0x6448d31f:00000000 -- Wed Apr 26 09:30:39 2023 mtime: 0x5aa27f70:00000000 -- Fri Mar 9 13:34:56 2018 crtime: 0x6448d33d:00000000 -- Wed Apr 26 09:31:09 2023

Not sure how to effectively edit them, I tried to create script which would loop over them and set the same time everytime but that doesnt seem like effective solution. Not sure if there is a good solution to this.

I will try with a poky master to see if some solution is already implemented.

ejoerns commented 2 months ago

@tcikel Is this still relevant, or did you find a way to fix it / work around? I'd assume this is nothing we can fix in meta-rauc thus I'd tend to close this.