weRginger / binarywarriors

Automatically exported from code.google.com/p/binarywarriors
0 stars 0 forks source link

Code to print MD5 hash of all data blocks of a file :: 27/01/2011 #3

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
1. Description of the problem :Write a code to print all data blocks numbers 
and find MD5 hash of each and every block. The user should only pass filename 
as input and the output should be MD5 hash of each and every block of a file.

2. How to test : Access the inode of a file and then access the data blocks   
of that file . Read the data blocks and generate MD5 hash of data blocks.

3. Concepts : MD5 hash algorithm and reading the file data blocks in ext2 file 
system

4. References : Understanding the Linux Kernel

5. Code Location : As a kernel module

Original issue reported on code.google.com by binarywarriors5@gmail.com on 3 Oct 2010 at 6:39

GoogleCodeExporter commented 9 years ago
Why MD5 is being used ? Why not CRC ?

Original comment by imreckless@gmail.com on 4 Oct 2010 at 10:31

GoogleCodeExporter commented 9 years ago
Guys please call the hash value as "fingerprints".

Original comment by sandeepksinha on 4 Oct 2010 at 10:46

GoogleCodeExporter commented 9 years ago
Still no update.

Original comment by imreckless@gmail.com on 5 Oct 2010 at 9:17

GoogleCodeExporter commented 9 years ago
We are not planning to use MD5, we will be using a more secure Hashing 
algorithm like SHA for the final purpose combined with "byte by byte" checking. 

We thought of using a combination of MD5 + CRC or any two hashing algorithms to 
minimize the chance of collision but anyways a byte by byte comparison will be 
more effective then just relying on any particular hashing algorithm.

We have a running piece of code for calculating the MD5 sum in kernel space 
hence we planned to use the code for this exercise.

The crux of the exercise lies in reading all the blocks of the file.

Original comment by binarywarriors5@gmail.com on 5 Oct 2010 at 5:48

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Hi all, 
Yesterday I trace the code of ext2 to read the disk inode of a particular 
filename ( which is passed as input ). I found the following information :

There is one function called ext2_get_inode which returns the ext2 disk inode 
structure pointer.

So, my approach was to first retrieve the pointer of a disk inode structure and 
then access the block numbers stored in i_block field of disk inode.

Now , to call ext2_get_inode function , I first made a ext2 type filesystem and 
attached it to the loop device using losetup.Then I mounted that filesystem.

Now ,when I insert my module and execute the code, the system gets hang.

So , I am currently trying to fix the bug.

Original comment by kashish....@gmail.com on 6 Oct 2010 at 6:30

GoogleCodeExporter commented 9 years ago
You will need to reboot the system, the system may have encountered a OOPS 
(read about it in LKD).

When you reboot the system open the file /var/log/messages and try to find the 
panic string, this will give a fair idea about what actually took your system 
to a hanged state. Most probably it will be de-referencing a NULL pointer.

In the above mentioned file try to find out the string "syslog daemon starting" 
or "proc/kmsg started".

Lines above it should have the panic string.

You all can use "crash" tool for finding out the errors. Learn about it at 
http://people.redhat.com/anderson/crash_whitepaper/

I am opening a issue for this.

Original comment by binarywarriors5@gmail.com on 6 Oct 2010 at 7:02

GoogleCodeExporter commented 9 years ago
Remember that your system would not panic in case of a hang. This typically 
indicates that you have not taken/released some lock properly. It would help if 
you could attach your kernel module code. 

Original comment by sandeepksinha on 7 Oct 2010 at 6:00

GoogleCodeExporter commented 9 years ago
Sir, now my kernel module code is working fine. I am able to print all the 
block numbers allocated to a file. Next thing I have to look for the functions 
which read these blocks.

Original comment by kashish....@gmail.com on 11 Oct 2010 at 4:53

GoogleCodeExporter commented 9 years ago
This is the summary of work done by me on this problem statement.

Original comment by kashish....@gmail.com on 11 Oct 2010 at 5:01

Attachments:

GoogleCodeExporter commented 9 years ago
Hi all,
The attached file includes the hierarchy of functions for reading the ext2 data 
blocks.

The function get_block() make changes in the structure buffer_head.This 
structure includes a field called b_data which is a pointer to the data present 
in the page.

The problem is that we cannot use function get_block() directly because 
structre buffer_head is also being modified by other functions in hierarchy.

So we are currently trying to fix this problem.

Are we moving in the right direction or not? Whether this hierarchy is correct? 

Original comment by binarywarriors5@gmail.com on 26 Jan 2011 at 7:16

Attachments:

GoogleCodeExporter commented 9 years ago
Well i feel you can use the the get_block function to get the data per block.

At a instant and for a particular block no. it would give correct data. 

You can take take the check-sum of the data pointed by b_data upto the size 
b_size.

Original comment by checkout...@gmail.com on 27 Jan 2011 at 2:54

GoogleCodeExporter commented 9 years ago
When do you plan to calculate the md5 signature? read or write? Which phase of 
read or write?

Original comment by sandeepksinha on 27 Jan 2011 at 5:25

GoogleCodeExporter commented 9 years ago
Hi all,
We have used the ext2_get_block() function which itself calls the get_block() 
function but unfortunately it returns <null> because ext2_get_block() function 
is also called by many functions in upper level of the hierarchy that 
initializes its various parameters.
So,how can we manually initialize those many parameters.That is why we are 
unable to find the function which copies the data into b_data.  

When do you plan to calculate the md5 signature? read or write? Which phase of 
read or write?

Sandeep Sir,we have already implemented MD5 code and we are just waiting to 
finish this module so that we can interlink MD5 module with this module.

In our project we are planing to calculate the MD5 fingerprint after reading 
the data of respective block.

Original comment by binarywarriors5@gmail.com on 27 Jan 2011 at 6:14

GoogleCodeExporter commented 9 years ago
Guys, I dunno What you ppl are doing ?

ext2_get_block is itself registered to get_block function of VFS so it wont 
ever call itself.

Well i personally feel you people should go through UTLK before jumping onto 
code. Hope that helps. 

If you still find issues let me know.

Original comment by checkout...@gmail.com on 27 Jan 2011 at 7:06

GoogleCodeExporter commented 9 years ago

Original comment by kashish....@gmail.com on 13 Mar 2011 at 6:22