knorrie / python-btrfs

Python Btrfs module
GNU Lesser General Public License v3.0
112 stars 22 forks source link

Remove obsolete CSUM_ITEM_KEY definition #10

Closed lorddoskias closed 7 years ago

lorddoskias commented 7 years ago

There is no key with value 120 in current btrfs code so let's remove it from python-btrfs. This lead to some confusing results while playing with checksum tree

knorrie commented 7 years ago

Ah, interesting. I didn't look at the data csumming part yet. I'll have a closer look soon (tm) and merge your additions. Thanks for contributing!

knorrie commented 7 years ago

I picked the first cleanup commit, thanks.

I'm looking into file csums now. I'd like to add two different example scripts.

First one is showing the stored checksums for a file. So, the purpose of the example is understanding the metadata structure and how to parse and unpack objects in that tree. A bit similar to how the code in 89e7478b04585 shows how to unpack free space bitmaps.

Second example can be actually trying to verify checksums of a file, while reading data from the file from disk, like the example you did.

The second one would probably make more sense in the context of adding "offline filesystem operations" to this lib, which I haven't started doing yet, since I haven't decided if/how yet.

About adding the helper function... I try to follow the naming and placement of things like they are in the kernel code, but in the case of checksums and crc32 it looks like it's a bit of a mess. Functions like btrfs_csum_data are hardcoded to use btrfs_crc32c etc. When opening a filesystem, we should just be able to call a function to check a checksum, without choosing an algorithm there, since it should be known from the superblock already.

Ah, but there is no way to retrieve the current checksum type using the ioctls. Hehe...

Anyway, enough to do.

knorrie commented 7 years ago

Looking at the truncate 32-bit change now. This brings me to the following issue: crc32c is called by extref_hash, which provides a u64 as input:

u32 btrfs_crc32c(u32 crc, const void *address, unsigned int length);

static inline u64 btrfs_extref_hash(u64 parent_objectid, const char *name, int len) { return (u64) btrfs_crc32c(parent_objectid, name, len); }

The code in python-btrfs does the same, but there it just starts working with that 64 bit parent_objectid value of course. So the new crc = crc & 0xffffffff changes the result of extref_hash for higher parent_objectid values. Either of the two results is the right one. I'm inclined to think that the current python-btrfs result is wrong.

What is the above C code supposed to do, and why does it even compile?

Maybe I should ask this on the mailing list instead?

knorrie commented 7 years ago

Ok, so the 64 bit parent_objectid gets truncated to 32 bit, so we have to explicitely truncate in the python code. I added the truncate to 32 bit change as a separate commit.

knorrie commented 7 years ago

I added the data checksum function. I also added an example that digs around in the checksum tree and shows how to grab a specific checksum value from there and compare it with some data from a file.

Thanks for your contribution.