xapi-project / vhd-tool

Command-line tools for streaming and manipulating vhd format data
Other
8 stars 26 forks source link

CA-287921: Copy only allocated & non-zero extents in case of NBD device #56

Closed gaborigloi closed 6 years ago

gaborigloi commented 6 years ago

Merge with https://github.com/xapi-project/xen-api/pull/3581

When we recognize an NBD device:

  1. We look up the path to the Unix domain socket of the corresponding NBD server from the file written by nbd_client_manager.py (from the xen-api repo)
  2. Then we call out to the get_nbd_extents.py script, which uses the BLOCK_STATUS NBD extension, introduced in QEMU 2.12, to query the list of extents and their block status flags.
  3. We construct from the returned JSON a structure that sparse_dd already understands and can handle.

Signed-off-by: Gabor Igloi gabor.igloi@citrix.com

Questions:

gaborigloi commented 6 years ago

Thanks for the reviews! In the corresponding xapi PR, https://github.com/xapi-project/xen-api/pull/3581 , I've added a quicktest that verifies the integrity of the copied 4MiB VDI by computing checksums of the original and the copy:

gaborigloi commented 6 years ago
~/s/vhd-tool (feature/REQ477/vdicopy|✚1) $ make test                                    
jbuilder runtest                            
Done: 411/413 (jobs: 1)                     

       suite alias test/runtest (exit 1)    
(cd _build/default/test && ./suite.exe)     
Testing suite.        
[ERROR]             Nbd_input          0   VDI with a large allocated extent list.      
-- Nbd_input.000 [VDI with a large allocated extent list.] Failed --                    
in _build/_tests/Nbd_input.000.output:      
[exception] Stack overflow                  
Raised at file "map.ml", line 122, characters 10-25                                     
Called from file "src0/sexp_conv.ml", line 150, characters 10-37                        

The full test results are available in `_build/_tests`.                                 
1 error! in 33.686s. 1 test run.            
Makefile:10: recipe for target 'test' failed
make: *** [test] Error 1  
edwintorok commented 6 years ago

Oops, so something is not tail recursive?

gaborigloi commented 6 years ago

I've uncommented most of the code, and only left the code that unmarshals the JSON, and it still failed with the exact same stack overflow error, so this means that we cannot fix this issue, it's in the JSON library used by ocaml-rpc, not in our code.

gaborigloi commented 6 years ago

I've narrowed down the issue by commenting out more code, the stack overflow happens here:

Jsonrpc.of_string extents_json

Then I've narrowed it down further to see where Jsonrpc.of_string is failing:

 let of_string s = s |> Y.from_string |> json_to_rpc 

I've tried only running

    let _a = Yojson.Safe.from_string extents_json in                                                                                                                           

and that passed. So the issue might be in the json_to_rpc function. It is not tail recursive:

 42 let rec json_to_rpc t =                                                                                                                                                      
 43   let open Yojson.Safe in                                                                                                                                                    
 44   match t with                                                                                                                                                               
 45   | `Intlit i -> Int (Int64.of_string i)                                                                                                                                     
 46   | `Int i -> Int (Int64.of_int i)                                                                                                                                           
 47   | `Bool b -> Bool b                                                                                                                                                        
 48   | `Float r -> Float r                                                                                                                                                      
 49   | `String s -> (* TODO: check if it is a DateTime *) String s                                                                                                              
 50   (* | DateTime d -> `String d *)                                                                                                                                            
 51   | `Null -> Null                                                                                                                                                            
 52   | `List a -> Enum (List.map json_to_rpc a)                                                                                                                                 
 53   | `Assoc a -> Dict (                                                                                                                                                       
 54       List.map (fun (k,v) -> (k, json_to_rpc v)) a                                                                                                                           
 55     )                                                                                                                                                                        
 56   | unsupported -> raise (JsonToRpcError unsupported)                                                                                                                        
 57                                                         
mseri commented 6 years ago

Can you check if there is a reasonably easy fix to do in ocaml-rpc, or if moving from yojson to jsonm could help?

gaborigloi commented 6 years ago

@mseri I've updated my comment above, it seems the stack overflow is not in yojson but in ocam-rpc. But maybe moving to jsonm would make it easier to write efficient code without stack overflow.

mseri commented 6 years ago

If you replace the two List.map with List.rev_map, does it resolve the stack overflow? (I am fairly sure order is not important in the dict, but you need to add an additional List.rev in the List)

gaborigloi commented 6 years ago

There is another stackoverflow in extent_list_of_rpc when using [@@deriving rpc. However, when using [@@deriving rpcty] and Rpcmarshal.unmarshal typ_of_extent_list, it works fine without a stackoverflow.

gaborigloi commented 6 years ago

Both stack overflows are fixed by https://github.com/mirage/ocaml-rpc/pull/90 :)

gaborigloi commented 6 years ago

Is this good to merge now? (I'm still unsure about the license though)

edwintorok commented 6 years ago

If you modify a file under BSD license, and you keep it under BSD license (and preserve original copyright notices) that should be fine. You could add our own copyright notice on top, but if this file is only temporary until the CP ticket you raised is done I don't think its worth doing that.

mseri commented 6 years ago

Failing with Error: Library "alcotest-lwt" not found.

gaborigloi commented 6 years ago

I've removed the test from Travis

mseri commented 6 years ago

Can't we add alcotest-lwt instead?

gaborigloi commented 6 years ago

I'm not sure how to do that. This is a buildenv-based build, not an opam-based one, so it should have been there already.

edwintorok commented 6 years ago

Is it due to https://github.com/xapi-project/xs-opam/commit/16aa11c332ae43ba2a01233bb07b3aa97db5ec3a ? We should probably add alcotest-lwt back since we begin to use it. Which means we'll need to get a new xs-opam before we can merge this PR?

mseri commented 6 years ago

Indeed! Can we make Travis use the docker images instead?

gaborigloi commented 6 years ago

Currently we don't require the test dependencies to be in the core xs-opam as we don't run xs-opam tests during our build. So I think we shouldn't add it for now, because it's not required in the build.

edwintorok commented 6 years ago

It is going to be required for xapi-clusterd (which does run its unit tests during the build). https://github.com/xapi-project/xs-opam/pull/263 Why aren't we running the unit tests during the build though?

gaborigloi commented 6 years ago

We haven't implemented it yet, but we probably should run them.

mseri commented 6 years ago

If it is required for xapi-clusterd as well, I suggest to make a PR to xs-opam with the following:

Then I'll tag and release the new xs-opam and we can merge this. If we do not run the tests also in the Jenkinsbuild we could fix travis and merge it in the meanwhile

mseri commented 6 years ago

Please update https://github.com/xapi-project/xs-opam/pull/263 with vhd-tool and clusterd opam changes

gaborigloi commented 6 years ago

The tests are done: https://github.com/mirage/ocaml-rpc/pull/93 :)

mseri commented 6 years ago

Released rpc-4.2.0, we can add it to the xs-opam PR: https://github.com/mirage/ocaml-rpc/releases/tag/v4.2.0

mseri commented 6 years ago

This PR was good, it brought a lot of fixes in many other places

mseri commented 6 years ago

Can we use a PRE_INSTALL_HOOK to do (or run a script that does) https://github.com/xapi-project/xs-opam/blob/master/tools/travis.sh#L26-L32 ? Or it happens too late inside the container?

If that works we could consider putting a script that does it in the xapi-travis-scripts and use it everywhere needed

mseri commented 6 years ago

Superseded by #57