xapi-project / xcp-networkd

The XCP networking daemon
Other
14 stars 42 forks source link

Mainly I changed to use Rresult for all error handle. #126

Closed xiewei20082008 closed 6 years ago

xiewei20082008 commented 6 years ago

Since a lot of changed have been made, I raised a new PR and closed the old one. Mainly I changed to use Rresult for all error handle.

I suppose the current sriov_error defined in IDL is too complex. Currently:

type sriov_error =
    | Device_not_found
    | Bus_out_of_range
    | Not_enough_mmio_resources
    | Unknown of string

But actually XAPI will not match different tags with different behaviors. So I changed it to just

type sriov_error =
    | Msg of string

Changed list:

moved functions related to ip link to module ip. Refined function parent_device_of_vf

Refined function device_index_of_vf

Add new Dracut module

Get max/num vfs from sysfs with two functions:

Add set_sriov_numvfs into Sysfs module.

Remove get_driver function

Add Modprobe module

Disable SRIOV will return a Rresult type

Remove open Xcp_pci

Unchanged list:

I tried to remove Sriov module in Network_utils, but failed as

I haven't yet split the commit into small commits. Can I split it after the code are ready to merge.


This change is Reviewable

xiewei20082008 commented 6 years ago

lib/network_utils.ml, line 237 at r1 (raw file):

Previously, mseri (Marcello Seri) wrote…
Can't we use `Scanf` instead of introducing a new dependency? I remember that some `Re` module had issues with thread safety. The same applies below.

As above, Re module dependency is not newly introduced to Toolstack, it is just perl style regex introduced, so I hope I can keep use Re_perl in the code.


Comments from Reviewable

xiewei20082008 commented 6 years ago

networkd/network_server.ml, line 1040 at r1 (raw file):

Previously, xiewei20082008 (Wei Xie) wrote…
Ok. I will change the return type of make_vf_conf in the IDL to be a result to keep the consistency with the enable/disable.

Done.


Comments from Reviewable

xiewei20082008 commented 6 years ago

lib/network_utils.ml, line 277 at r1 (raw file):

Previously, lindig (Christian Lindig) wrote…
This computes `1 - sriov_totalvfs`. You probably want `sriov_totalvfs - 1`. Use `|> fun n -> n - 1`

Done.


Comments from Reviewable

mseri commented 6 years ago

networkd/network_server.ml, line 17 at r1 (raw file):

open Network_utils
open Network_interface
open Rresult.R.Infix

I think it's better to open this locally where actually used and not in the toplevel.


Comments from Reviewable

mseri commented 6 years ago

lib/network_utils.ml, line 19 at r2 (raw file):

open Xapi_stdext_std
open Network_interface
open Rresult.R.Infix

Here as well, let's open Rresult.R.Infix inline where we use it instead of the toplevel


Comments from Reviewable

xiewei20082008 commented 6 years ago

networkd/network_server.ml, line 17 at r1 (raw file):

Previously, mseri (Marcello Seri) wrote…
I think it's better to open this locally where actually used and not in the toplevel.

Done.


Comments from Reviewable

xiewei20082008 commented 6 years ago

lib/network_utils.ml, line 19 at r2 (raw file):

Previously, mseri (Marcello Seri) wrote…
Here as well, let's open `Rresult.R.Infix` inline where we use it instead of the toplevel

Done.


Comments from Reviewable

robhoes commented 6 years ago

Review status: 0 of 3 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


lib/network_utils.ml, line 1408 at r3 (raw file):

      | _ -> Ok ()

  let make_vf_conf_internal pcibuspath mac vlan rate =

I think that this function should just be inline in make_vf_conf in network_server.ml. I don't really like the *_internal functions here. This network_utils module is meant to be a collections of simple functions around Linux/system tools that are sequenced from network_server. This function already uses only calls to other modules/functions of network_util, so it can easily be moved.


lib/network_utils.ml, line 1419 at r3 (raw file):

      exe_except_none (Ip.set_vf_vlan dev index) vlan >>= fun () ->
      exe_except_none (Ip.set_vf_rate dev index) rate >>= fun () ->
      Result.Ok ()

You don't really need this last line, as the one above will already return Ok if all is well. I do like the way this looks now with the use of the error monad!


networkd/network_server.ml, line 378 at r3 (raw file):

  let get_capabilities _ dbg ~name =
      Debug.with_thread_associated dbg (fun () ->
          Fcoe.get_capabilities name @ Network_utils.Sriov.get_capabilities name

You don't actually need Network_utils. because this is opened on top (below as well).


networkd/network_server.ml, line 1024 at r3 (raw file):

      ) ()

  let make_vf_config _ dbg ~pci_address ~(vf_info : Sriov.sriov_pci_t)=

I don't think that this type annotation add much value?


Comments from Reviewable

robhoes commented 6 years ago

Review status: 0 of 3 files reviewed at latest revision, 15 unresolved discussions, some commit checks failed.


lib/network_utils.ml, line 41 at r3 (raw file):

let ethtool = ref "/sbin/ethtool"
let bonding_dir = "/proc/net/bonding/"
let uname = "/usr/bin/uname"

Please make all of these refs so that they can be set from a config file. Code needs to be added to networkd.ml to wire this up.


lib/network_utils.ml, line 1348 at r3 (raw file):

      with _ -> Result.Error (Other, "Failed to parse modprobe conf for SRIOV configuration for " ^ driver)

  let get_capabilities dev = 

See note on make_vf_conf_internal below regarding the split between network_server and network_utils. I'd move this into network_server.


lib/network_utils.ml, line 1351 at r3 (raw file):

      if Sysfs.get_sriov_maxvfs dev = 0 then [] else ["sriov"]

  let enable_internal dev =

Again, see note on make_vf_conf_internal below regarding the split between network_server and network_utils. I'd move this into network_server.


lib/network_utils.ml, line 1356 at r3 (raw file):

      and maxvfs = Sysfs.get_sriov_maxvfs dev in
      Sysfs.get_driver_name_err dev >>= 
      fun driver ->

We'd normally put this on the previous line.


lib/network_utils.ml, line 1359 at r3 (raw file):

      parse_modprobe_conf driver maxvfs >>=
      fun (has_probe_conf, need_rebuild_initrd, conf) ->
      let enable_sriov_via_modprobe ()=

I think that this whole function is quite long and the control flow can be made clearer, starting by taking this function outside of the main one (replace the unit arg by the ones it actually uses).

It should become very easy to see that this function tries to enable by sysfs and uses the modprobe method as fallback.


lib/network_utils.ml, line 1371 at r3 (raw file):

          | _ -> Ok ()
      in
      if maxvfs = 0 then Result.Error (Other, (Printf.sprintf "%s: do not have sriov capabilities" dev))

This looks like a good candidate for an explicit error code. Also, you could consider flattening the if-statement (not sure if it is clearer):

if maxvfs = 0 then Error xxx else Ok () >>= fun () ->

lib/network_utils.ml, line 1372 at r3 (raw file):

      in
      if maxvfs = 0 then Result.Error (Other, (Printf.sprintf "%s: do not have sriov capabilities" dev))
      else if numvfs = 0 then begin

This if-statement is probably relevant only for the modprobe case. I assume that we can call Sysfs.set_sriov_numvfs even if numvfs <> 0, without any bad side-effects (i.e. it is idempotent)? Then I think that it is clearer to push this down into the modprobe case below, and first to the sysfs method unconditionally.


lib/network_utils.ml, line 1394 at r3 (raw file):

      end

  let disable_internal dev =

Again, see note on make_vf_conf_internal below regarding the split between network_server and network_utils. I'd move this into network_server.


Comments from Reviewable

robhoes commented 6 years ago

Reviewed 1 of 3 files at r1, 1 of 2 files at r3. Review status: 2 of 3 files reviewed at latest revision, 15 unresolved discussions, some commit checks failed.


Comments from Reviewable

xiewei20082008 commented 6 years ago

lib/network_utils.ml, line 1348 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
See note on `make_vf_conf_internal` below regarding the split between network_server and network_utils. I'd move this into network_server.

I saw this comment last time and I wrote the reason why I didn't move them to network_server (actually I had moved but move back), but looks like they are just shown in github not Reviewable..

The reasons I didn't move them to network_server are:


Comments from Reviewable

xiewei20082008 commented 6 years ago

lib/network_utils.ml, line 1348 at r3 (raw file):

Previously, xiewei20082008 (Wei Xie) wrote…
I saw this comment last time and I wrote the reason why I didn't move them to network_server (actually I had moved but move back), but looks like they are just shown in github not Reviewable.. The reasons I didn't move them to network_server are: - Network_server is compiled into a executable instead of a library, so the functions in Network_server is not easy to be called by other Networkd part. (e.g., What about the case where I have another Module in Network_util want to call enabling SRIOV) - I know Network_util is just used for wrapping Linux commands, so do you think we need another `helper.ml` to include these SRIOV functions? - Currently the `get_capabilities` function of FCoE is also in Network_utils - look like it is not reasonable to be in the Network_utils either, so if we have a helper.ml we can also put the `get_capabilities` function of FCoE in helper.ml

Sorry for letting you comment again.


Comments from Reviewable

xiewei20082008 commented 6 years ago

lib/network_utils.ml, line 41 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
Please make all of these `ref`s so that they can be set from a config file. Code needs to be added to networkd.ml to wire this up.

Do I just need to refine uname and dracut? or the whole commands above that are not related to SRIOV?


Comments from Reviewable

xiewei20082008 commented 6 years ago

networkd/network_server.ml, line 378 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
You don't actually need `Network_utils.` because this _is_ opened on top (below as well).

This is because Network_utils.Sriov has the same name with Network_server.Sriov - just to make the compiler happy.


Comments from Reviewable

robhoes commented 6 years ago

Review status: 2 of 3 files reviewed at latest revision, 14 unresolved discussions, some commit checks failed.


lib/network_utils.ml, line 41 at r3 (raw file):

Previously, xiewei20082008 (Wei Xie) wrote…
Do I just need to refine `uname` and `dracut`? or the whole commands above that are not related to SRIOV?

Just the ones that you've added.


lib/network_utils.ml, line 1348 at r3 (raw file):

Previously, xiewei20082008 (Wei Xie) wrote…
Sorry for letting you comment again.

I wouldn't worry too much about what is currently a library and what isn't. This is easy to change if we need to.

The structure of networkd is really quite simple. There are three parts:

  1. Wrappers around Linux tools: network_utils. This should be as simple as possible without much added logic.
  2. Higher level functionality that use those tools: network_server. Enabling and disabling SR-IOV looks to be in this category, because it is our own logic. There shouldn't be anything in network_utils that need to call this.
  3. A monitor for network stats and related: network_monitor_thread. This has access to and uses both of the above.

I'd like to avoid adding another layer of helper modules to keep this simple.


Comments from Reviewable

robhoes commented 6 years ago

Reviewed 1 of 2 files at r3. Review status: all files reviewed at latest revision, 17 unresolved discussions, some commit checks failed.


lib/network_utils.ml, line 495 at r3 (raw file):

      try
          Result.Ok (link_set dev ["vf"; string_of_int index; "mac"; string_of_int rate])
      with _ -> Result.Error (Fail_to_set_vf_rate, "Failed to set vf rate for: " ^ dev)

This one's got an explicit error. Why not do that for the other ones as well, rather than Other "<string>"?


lib/network_utils.ml, line 1286 at r3 (raw file):


module Sriov = struct
  let gen_options_for_maxvfs driver max_vfs =

A comment on top would be useful.


lib/network_utils.ml, line 1296 at r3 (raw file):

      match Sysfs.get_dev_nums_with_same_driver driver with
      | num when num > 0 -> Result.Ok (
          gen_list (string_of_int max_vfs) num

Perhaps this is easier:

Array.make num (string_of_int max_vfs)
|> Array.to_list
|> String.concat ","

Comments from Reviewable

mseri commented 6 years ago

lib/network_utils.ml, line 276 at r2 (raw file):

  let set_sriov_numvfs dev num_vfs =
      let interface = getpath dev "device/sriov_numvfs" in
      let oc = open_out interface in

Why not Unixext.write_string_to_file as we have done in the other places?


Comments from Reviewable

mseri commented 6 years ago

lib/network_utils.ml, line 291 at r2 (raw file):

          Result.Error (Not_enough_mmio_resources, "Error: not enough mmio resources when setting SRIOV numvfs on " ^ dev)
      | e ->
          let msg = Printf.sprintf "Error: set sriov error with exception %s on %s" (Printexc.to_string e) dev in

We always use SRIOV full caps, let's stick with the convention here as well


Comments from Reviewable

xiewei20082008 commented 6 years ago

Review status: all files reviewed at latest revision, 19 unresolved discussions, some commit checks failed.


lib/network_utils.ml, line 276 at r2 (raw file):

Previously, mseri (Marcello Seri) wrote…
Why not `Unixext.write_string_to_file` as we have done in the other places?

We cannot do that. Sysfs interface is a file that you can just open and write something in, but not allow to be removed or replaced, while Unixext.write_string_to_file is trying to replace the file with a new one.


lib/network_utils.ml, line 495 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
This one's got an explicit error. Why not do that for the other ones as well, rather than `Other ""`?

For our currently supported NICs, all those NICs support MAC configuration and VLAN configuration, but some NICs do not support Rate configuration. We pass the Fail_to_set_vf_rate as a explicit error to XAPI to let XAPI know that this is not a general error but an error of hardware itself.


lib/network_utils.ml, line 1372 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
This if-statement is probably relevant only for the modprobe case. I assume that we can call `Sysfs.set_sriov_numvfs` even if `numvfs <> 0`, without any bad side-effects (i.e. it is idempotent)? Then I think that it is clearer to push this down into the modprobe case below, and first to the sysfs method unconditionally.

We cannot first call sysfs method unconditionally for the case where the SRIOV hardware status is ON and we are about to enable SRIOV, which is the else case. It is because sysfs method to enable SRIOV on a SRIOV already enabled device will always be succuessful even if the device doesn't support sysfs. A simple example is as follows:


Comments from Reviewable

xiewei20082008 commented 6 years ago

networkd/network_server.ml, line 1024 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
I don't think that this type annotation add much value?

This is just to make compiler happy, since we have another Network_interface.bridge_config_t.vf_info with the same name.


Comments from Reviewable

robhoes commented 6 years ago

Review status: 1 of 4 files reviewed at latest revision, 18 unresolved discussions, some commit checks failed.


lib/network_utils.ml, line 276 at r2 (raw file):

Previously, xiewei20082008 (Wei Xie) wrote…
We cannot do that. Sysfs interface is a file that you can just open and write something in, but not allow to be removed or replaced, while Unixext.write_string_to_file is trying to replace the file with a new one.

And write_one_line from this module?


lib/network_utils.ml, line 291 at r2 (raw file):

Previously, mseri (Marcello Seri) wrote…
We always use `SRIOV` full caps, let's stick with the convention here as well

SR-IOV actually ;)


lib/network_utils.ml, line 1372 at r3 (raw file):

Previously, xiewei20082008 (Wei Xie) wrote…
We cannot first call sysfs method unconditionally for the case where the `SRIOV hardware status` is ON and we are about to enable SRIOV, which is the else case. It is because sysfs method to enable SRIOV on a `SRIOV already enabled device` will always be succuessful even if the device doesn't support sysfs. A simple example is as follows: - a device that doesn't support sysfs - enable the device via modprobe -> Hardware Status OFF - reboot -> Status ON - disable via modprobe before reboot -> Status ON - enable via sysfs -> will return Sysfs_succesfully -> Here bug arise.

OK cool, it would be good to explain this in a comment in the code, otherwise someone might decide to "optimise" the code later.


Comments from Reviewable

xiewei20082008 commented 6 years ago

Review status: 1 of 4 files reviewed at latest revision, 16 unresolved discussions.


lib/network_utils.ml, line 41 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
Just the ones that you've added.

Done.


lib/network_utils.ml, line 1286 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
A comment on top would be useful.

Done.


lib/network_utils.ml, line 1296 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
Perhaps this is easier: Array.make num (string_of_int max_vfs) |> Array.to_list |> String.concat ","

Done. A magic was learnt.


lib/network_utils.ml, line 1348 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
I wouldn't worry too much about what is currently a library and what isn't. This is easy to change if we need to. The structure of networkd is really quite simple. There are three parts: 1. Wrappers around Linux tools: network_utils. This should be as simple as possible without much added logic. 2. Higher level functionality that use those tools: network_server. Enabling and disabling SR-IOV looks to be in this category, because it is our own logic. There shouldn't be anything in network_utils that need to call this. 3. A monitor for network stats and related: network_monitor_thread. This has access to and uses _both_ of the above. I'd like to avoid adding another layer of helper modules to keep this simple.

Done. Understood.


lib/network_utils.ml, line 1351 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
Again, see note on `make_vf_conf_internal` below regarding the split between network_server and network_utils. I'd move this into network_server.

Done.


lib/network_utils.ml, line 1356 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
We'd normally put this on the previous line.

Done.


lib/network_utils.ml, line 1359 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
I think that this whole function is quite long and the control flow can be made clearer, starting by taking this function outside of the main one (replace the unit arg by the ones it actually uses). It should become very easy to see that this function tries to enable by sysfs and uses the modprobe method as fallback.

Done. I think I didn't give them a good name for has_probe_conf, need_rebuild_initrd, conf.


lib/network_utils.ml, line 1371 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
This looks like a good candidate for an explicit error code. Also, you could consider flattening the if-statement (not sure if it is clearer): ``` if maxvfs = 0 then Error xxx else Ok () >>= fun () -> ```

Done. A magic was learnt.


lib/network_utils.ml, line 1394 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
Again, see note on `make_vf_conf_internal` below regarding the split between network_server and network_utils. I'd move this into network_server.

Done.


lib/network_utils.ml, line 1408 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
I think that this function should just be inline in `make_vf_conf` in network_server.ml. I don't really like the `*_internal` functions here. This network_utils module is meant to be a collections of simple functions around Linux/system tools that are sequenced from network_server. This function already uses only calls to other modules/functions of network_util, so it can easily be moved.

Done.


lib/network_utils.ml, line 1419 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
You don't really need this last line, as the one above will already return `Ok` if all is well. I do like the way this looks now with the use of the error monad!

Done. A magic was learnt.


Comments from Reviewable

mseri commented 6 years ago

lib/network_utils.ml, line 291 at r2 (raw file):

Previously, robhoes (Rob Hoes) wrote…
SR-IOV actually ;)

Then it needs changes in all the error lines :P


Comments from Reviewable

mseri commented 6 years ago

lib/network_utils.ml, line 495 at r3 (raw file):

Previously, xiewei20082008 (Wei Xie) wrote…
For our currently supported NICs, all those NICs support MAC configuration and VLAN configuration, but some NICs do not support Rate configuration. We pass the `Fail_to_set_vf_rate` as a explicit error to XAPI to let XAPI know that this is not a general error but an error of hardware itself.

Even if it's not yet used by xapi, I don't see the problem in making the errors better and more explicit when we know exactly what the errors are


Comments from Reviewable

xiewei20082008 commented 6 years ago

Review status: 1 of 4 files reviewed at latest revision, 15 unresolved discussions, some commit checks failed.


lib/network_utils.ml, line 291 at r2 (raw file):

Previously, mseri (Marcello Seri) wrote…
Then it needs changes in all the error lines :P

No problem. I will fix them all tomorrow. ;)


Comments from Reviewable

xiewei20082008 commented 6 years ago

Review status: 1 of 4 files reviewed at latest revision, 16 unresolved discussions.


lib/network_utils.ml, line 276 at r2 (raw file):

Previously, robhoes (Rob Hoes) wrote…
And `write_one_line` from this module?

Done. Tried this one and it works.


lib/network_utils.ml, line 291 at r2 (raw file):

Previously, xiewei20082008 (Wei Xie) wrote…
No problem. I will fix them all tomorrow. ;)

Done.


lib/network_utils.ml, line 495 at r3 (raw file):

Previously, mseri (Marcello Seri) wrote…
Even if it's not yet used by xapi, I don't see the problem in making the errors better and more explicit when we know exactly what the errors are

Done.


lib/network_utils.ml, line 1372 at r3 (raw file):

Previously, robhoes (Rob Hoes) wrote…
OK cool, it would be good to explain this in a comment in the code, otherwise someone might decide to "optimise" the code later.

Done.


Comments from Reviewable

robhoes commented 6 years ago

Reviewed 1 of 3 files at r4, 2 of 2 files at r6. Review status: all files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


networkd/network_server.ml, line 147 at r6 (raw file):

  *)
  let parse_modprobe_conf_internal file_path driver option =
      (* Initially I did not use ref here, but afterward changed to ref to make the code easy to read *)

Please remove this comment.


Comments from Reviewable

xiewei20082008 commented 6 years ago

Review status: 3 of 4 files reviewed at latest revision, 4 unresolved discussions.


networkd/network_server.ml, line 147 at r6 (raw file):

Previously, robhoes (Rob Hoes) wrote…
Please remove this comment.

Done.


Comments from Reviewable

robhoes commented 6 years ago
:lgtm:

Reviewed 1 of 1 files at r7. Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


Comments from Reviewable

mseri commented 6 years ago

networkd/network_server.ml, line 130 at r7 (raw file):

      if Sysfs.get_sriov_maxvfs dev = 0 then [] else ["sriov"]

  (* To enable SR-IOV via modprobe configuration, we add a line like `options igb max_vfs=7,7,7`      

This is not important, but can you fix this comment before merging? Some newlines went missing


Comments from Reviewable

lindig commented 6 years ago

Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


lib/network_utils.ml, line 277 at r1 (raw file):

          |> String.trim
          |> int_of_string
          |> (-) 1 (* maxvfs is totalvfs -1, as totalvfs is PF num + VF num *)

This computes 1 - sriov_totalvfs. I assume you want sriovfs -1. Use :100:

|> fun n -> n - 1


Comments from Reviewable

robhoes commented 6 years ago

This is not important, but can you fix this comment before merging? Some newlines went missing

Let's do this later :)