mirage / ocaml-mbr

A simple library for manipulating Master Boot Records
ISC License
18 stars 8 forks source link

Tools for working with or creating MBR-formatted disk images #14

Open reynir opened 1 year ago

reynir commented 1 year ago

It would be nice to develop one or more tools to:

PizieDust commented 1 year ago

Hello @reynir It will be my pleasure to try this issue. Please can I be assigned here?

reynir commented 1 year ago

@PizieDust what would you like to work on? I suggest working on one of the two first as they are not as difficult as the latter two. It may be helpful to look at the wikipedia article. We implement the "modern standard MBR". https://en.wikipedia.org/wiki/Master_boot_record

To create a new binary executable you can use dune init exec --libs mbr mbr_inspect bin/. This will create a directory bin/ with files mbr_inspect.ml and dune. You write the code in mbr_inspect.ml.

PizieDust commented 1 year ago

Hi @reynir , I apologize for the delay on this issue. This is my first time working with functional programming. For this task, will it be better to create a new .ml file in the lib directory or is it okay to modify the code in mbr.ml to achieve the task.

PizieDust commented 1 year ago

@PizieDust what would you like to work on? I suggest working on one of the two first as they are not as difficult as the latter two. It may be helpful to look at the wikipedia article. We implement the "modern standard MBR". https://en.wikipedia.org/wiki/Master_boot_record

To create a new binary executable you can use dune init exec --libs mbr mbr_inspect bin/. This will create a directory bin/ with files mbr_inspect.ml and dune. You write the code in mbr_inspect.ml.

Okay thank you. This is very helpful.

PizieDust commented 1 year ago

@reynir So, for the first task, my understanding is that we need a function or a module that can receive as input a MBR header and read it's contents which are:

  bootstrap_code : string;
  original_physical_drive : int;
  seconds : int;
  minutes : int;
  hours : int;
  disk_signature : int32;
  partitions : Partition.t list;

or for the modern mbr standard which contains the fields:

  bootstrap_code1 : uint8_t; [@len 218]
  _zeroes_1 : uint8_t; [@len 2]
  original_physical_drive : uint8_t;
  seconds : uint8_t;
  minutes : uint8_t;
  hours : uint8_t;
  bootstrap_code2 : uint8_t; [@len 216]
  disk_signature : uint32_t;
  _zeroes_2 : uint8_t; [@len 2]
  partitions : uint8_t; [@len 64]
  signature1 : uint8_t; (* 0x55 *)
  signature2 : uint8_t; (* 0xaa *)

and for each of the partitions (max 4), we also iterate over them and read the fields which are:

    active : bool;
    first_absolute_sector_chs : Geometry.t;
    ty : int;
    last_absolute_sector_chs : Geometry.t;
    first_absolute_sector_lba : int32;
    sectors : int32;

In the file lib/mbr.ml, it is my understanding the function unmarshalfunction converts the MBR header from a cstruct to an ocaml record.

So my question is, in the implementation of mbr_inspect.ml, can we use this unmarshal function to first parse the mbr header into an ocaml record and then access the fields of the record and print them?

we also have an unmarshal function in the partition module.

I'm trying to figure out the ocaml syntax on how to achieve this. hope I am in the right direction?

PizieDust commented 1 year ago

For the second task, here is a basic implementation I have (pardon the wrong syntax)

let print_partition partition =
     Printf.printf "Active: partition.Partition.active;
     Printf.printf "Type: partition.Partition.ty;
...
reynir commented 1 year ago

Yes, you're on the right track. Use the unmarshal to parse the MBR header from a cstruct. A Cstruct.t is a kind of buffer. You can use Cstruct.of_string to convert a string to a Cstruct.t if you're reading strings.

The difference between the type t and type mbr is the latter is very close to the disk layout and includes attributes used for code generation using the "ppx" mechanism. The generated code helps parse the raw modern standard MBR format. You don't have to worry much about that for now. The former is an OCaml record as you note. It is a higher level representation. The zeroes are left out, and the bootstrap code is concatenated. In other words we hide away some unimportant details of the underlying representation.

PizieDust commented 1 year ago

@reynir I'm trying to create a partition with the code (which will be saved in a file) and then use this partition to test the code for this task, but I keep having errors with creating the partition. Here is the code I am using:

    let disk_length_bytes = Int32.(mul (mul 16l 1024l) 1024l) in
    let disk_length_sectors = Int32.(div disk_length_bytes 512l) in
    let start_sector = 2048l in
    let length_sectors = Int32.sub disk_length_sectors start_sector in
    let partition1 = Mbr.Partition.make ~active:true ~ty:6 start_sector length_sectors in
    let mbr = Mbr.make [ partition1 ] in
    match mbr with
    | Ok -> print_endline "MBR created"
    | Error msg -> Printf.printf "MBR failed %s\n" msg

running dune build gives the error:

11| let mbr = Mbr.make [ partition1 ] in (* partitions is underlined *)
Error: This expression has type (Mbr.Partition.t, string) result but an expression was expected of type Mbr.Partition.t

Please what could I be doing wrongly?

reynir commented 1 year ago

Edit: I forgot to explain the error. The error is due to Mbr.Partition.make returning a "result" type. You need to match in the same way as you do for the result of Mbr.make.

I created a test archive (in a .tar.gz due to github limitations): test.img.tar.gz

I used the following commands:

$ fallocate -l 2M test.img # Allocate a 2 MB file, which is 4 sectors
$ parted --align none test.img mklabel msdos # Write MBR
$ parted --align none test.img mkpart primary 1s 1s # Add a one-sector partition
$ parted --align none test.img mkpart primary 2s 3s # Add a two-sector partition (remaining space)

The argument --align none tells parted to not try to align things in a way required by old operating systems (and/or disks).

PizieDust commented 1 year ago

Oh I see. Thank you. I just downloaded the archive. It'll come in really handy. thanks much

PizieDust commented 1 year ago

@reynir Concerning task 3 (resizing partitions in a disk image). I have some questions. so say we have a disk with 3 partitions:

my idea of the function was something like this:

let resize_partiton mbr partition_number new_size = 
.....

where our function takes in the MBR, the partition number to be resized and the new size. then we iterate over the existing partitions in MBR and find the exact partition we are looking for. Then we change the sectors to match the new size. and return the new partition.

From here, I think we'll update the partitions in the MBR and write the changes to the file (overwrite??) I don't know if I'm thinking about it correctly.

I created a new file resize_partition.ml in the /bin folder.

reynir commented 1 year ago

Indeed, the partitions may overlap and care should be taken in that case. The default should be to error if partitions would overlap. Shifting partitions would require moving data around on the disk (image). If this fails, for example sudden shutdown or the user aborting the command, you may end up with a bad state in the partition. I don't expect you to implement this.

You would need to overwrite the first 512 bytes of the file with a new header. Take care not to truncate or replace the file.

The interface of the library likely has to be extended a bit. The types Mbr.t and Mbr.Partition.t are marked private meaning that users of the library can inspect the type but not create new values directly. Instead, the "smart" constructors Mbr.make and Mbr.Partition.make has to be used. The "smart" thing about smart constructors is they are functions that can ensure invariants are kept. It may be handy to implement a function Mbr.with_partitions that takes a Mbr.t and a Mbr.Partition.t list and returns a new Mbr.t. I am as well doubting the current interface a bit :)

https://v2.ocaml.org/releases/5.0/htmlman/privatetypes.html

PizieDust commented 1 year ago

Thank you, this is very helpful. One final question, say it is implemented and the function returns a new Mbr.t, do we just print the new Mbr structure to console or is it to be written to like a file? I'm not quite sure. Or can I just work on this and output to console and then after reviewing you indicate what can be done next?

reynir commented 1 year ago

The MBR structure needs to be marshaled and the marshaled structure is what needs to be written.

reynir commented 1 year ago

I updated the issue with tasks done and I clarified the second task -- it's about reading the data from a partition

PizieDust commented 1 year ago

okay great. For the second task, reading the data content of a partition: meaning what is actually stored in this partition ?

reynir commented 1 year ago

Yes. The idea is a partition may contain e.g. a tar archive. Because of the MBR header (and potentially other partitions before) the standard tar tools are not able to read the tar archive. Instead, it may be useful to "extract" the partition data in order to use other tools. A related task could be to "blit" or write data from a file into a partition in a disk image.

PizieDust commented 1 year ago

Hello @reynir For the subtask on task 2: writing to a partition. I have a general idea in this but I am having trouble with which functions to use which can write to a partition. I understand this will require some low level I/O operations. I looked into some modules and found the Ocaml Unix module. I am unsure if this is a good fit given that this code is for use with Mirage OS and may not have support for Unix type functionality. Let me know your thoughts on this.

reynir commented 1 year ago

Hello @reynir For the subtask on task 2: writing to a partition. I have a general idea in this but I am having trouble with which functions to use which can write to a partition. I understand this will require some low level I/O operations. I looked into some modules and found the Ocaml Unix module. I am unsure if this is a good fit given that this code is for use with Mirage OS and may not have support for Unix type functionality. Let me know your thoughts on this.

You can use the Unix module for the command line tools. They are for running in a Unix-like environment. While the I/O operations would need to be changed for Mirage the source of the tool can serve as an example on how to use the library (while being useful!)

PizieDust commented 1 year ago

Oh thank you. Let me get on it. I have some bare code written, I'll just add the Unix module and experiment with it.

reynir commented 1 year ago

I think you should be able to use seek_out for writing to the partition FWIW

0xrotense commented 1 year ago

I would like to work on one of the last two, can you assign me, please?

PizieDust commented 1 year ago

I would like to work on one of the last two, can you assign me, please?

That's cool. But I already have some code in development. Maybe you could help review when I open PR's

reynir commented 1 year ago

@0x0god I would advice that you focus on https://github.com/reynir/mirage-block-partition/issues/6. The last two tasks in this issue are not easy first tasks.

0xrotense commented 1 year ago

@0x0god I would advice that you focus on reynir/mirage-block-partition#6. The last two tasks in this issue are not easy first tasks.

Thanks for the advice, Please check my comment on my recent PR.

PizieDust commented 1 year ago

Hello @reynir When you are available I have some questions about the final task of this issue:

Create a MBR formatted disk image with contents from files passed as command line arguments. Padding and empty sections may optionally be passed.

So far we have been working with MBR headers. The task is about creating an MBR formatted disk. Is this to achieve the same thing as when we use fallocate and parted?

By empty sections do we mean un-allocated space? I don't quite understand what padding here means.

Also what's the expected implementation for this task like? A script which maybe executes some unix commands to create the disk, usage of the other scripts such as write_partition.exe to write file contents to the disk?

reynir commented 1 year ago

Yes, this is about achieving the same as we did with fallocate and parted, and a little more. The idea was to pass zero or more files with partition data, some options and a destination and it would write a disk image to the destination with a MBR header and partitions containing the data. I think what I meant with padding was partitions larger than the input data - so containing either zeroes or uninitialized data at the end. And empty sections would be space in between partitions.

Given that you wrote several tools that can be used to achieve some of the sub tasks it maybe makes more sense to just write a tool for writing a fresh MBR header. Then a small script could be written to achieve the above.

PizieDust commented 1 year ago

Thank you. This is a great explanation.

PizieDust commented 1 year ago

Hi @reynir I've been tinkering around with this. Here is what my initial comments are. When we are creating a disk with files which will be saved in partitions, this means we are limited to 4 files right? as the MBR structure can only support a maximum of 4 partitions? Following from this argument, one can also assume that we can auto-calculate the size of the disk image by taking the sum of the sizes of each individual file?