mirage / ocaml-git

Pure OCaml Git format and protocol
ISC License
363 stars 70 forks source link

Improper handling of commits with no/empty email #639

Closed kopecs closed 5 days ago

kopecs commented 3 months ago

When attempting to read a commit object which has an author or committer with no email (i.e., an empty email) Git_unix.Store.read returns Error (`Msg "Invalid Git Object"). I would expect the representation of the object to be returned instead.

I believe this is due to the parsing code in user.ml: https://github.com/mirage/ocaml-git/blob/b2288c266142d9d0c5e87b109d0e501f4efda4fb/src/git/user.ml#L83-L90 Namely, the while1 on line 88.

Reproduction Steps

Create a git repository with an commit with an empty author or committer email. ```sh $ git init $ echo "foo" > a.txt && git add a.txt $ git -c user.name="Test User" -c user.email='' commit -m "Test Commit" ``` Attempt to read the commit, e.g.: ```ocaml module Hash = Git.Hash.Make (Digestif.SHA1) module Value = Git.Value.Make (Hash) let main () = let%lwt store = match%lwt Git_unix.Store.v (Fpath.v (Sys.getcwd ())) with | Ok store -> Lwt.return store | Error (`Reference_not_found _) -> failwith "Reference not found. Are you in a git repository?" | Error (`Not_found _) -> failwith "Not found. Are you in a git repository?" | Error (`Msg msg) -> failwith msg | _ -> failwith "Unknown error" in let%lwt hashes = Git_unix.Store.list store in Lwt_list.iter_s (fun hash -> match%lwt Git_unix.Store.read store hash with | Ok _ -> Lwt.return_unit | Error e -> Format.printf "Failed to read object with hash %a: %a" Git_unix.Store.Hash.pp hash Git_unix.Store.pp_error e; Lwt.return_unit) hashes let () = Lwt_main.run @@ main () ```


A related issue seems to be present for writing such objects as well. I had hoped to just patch the above by updating the while1 to a while0, but while updating the examples here to test, I ran into an exception from Encore where it seems it cannot serialize an empty string. For example, the following program

module Hash = Git.Hash.Make (Digestif.SHA1)
module Value = Git.Value.Make (Hash)

let main () =
  let%lwt store =
    match%lwt Git_unix.Store.v (Fpath.v (Sys.getcwd ())) with
    | Ok store -> Lwt.return store
    | Error (`Reference_not_found _) ->
        failwith "Reference not found. Are you in a git repository?"
    | Error (`Not_found _) -> failwith "Not found. Are you in a git repository?"
    | Error (`Msg msg) -> failwith msg
    | _ -> failwith "Unknown error"
  in
  let tree = Value.Tree.v [] |> Value.tree in
  let author = Git.User.{ name = "A U Thor"; email = ""; date = (0L, None) } in
  let tree_hash = Value.digest tree in
  let message = "Message" in
  let commit =
    Value.Commit.make ~author ~committer:author ~tree:tree_hash (Some message)
    |> Value.commit
  in
  match%lwt Git_unix.Store.write store commit with
  | Ok _ -> Lwt.return_unit
  | Error e ->
      Format.printf "Failed to write commit: %a" Git_unix.Store.pp_error e;
      Lwt.return_unit

let () = Lwt_main.run @@ main ()

results in the following backtrace:

Fatal error: exception Invalid_argument("emit_string")
Raised at Stdlib.invalid_arg in file "stdlib.ml", line 30, characters 20-45
Called from Encore__Lavoisier.map.(fun) in file "lib/lavoisier.ml", line 264, characters 46-51
Called from Encore__Lavoisier.emit_string in file "lib/lavoisier.ml", line 72, characters 5-19
Called from Git__Store.Make.write in file "src/git/store.ml", line 278, characters 14-43
Called from Lwt.Sequential_composition.try_bind in file "src/core/lwt.ml", line 2139, characters 10-14
Re-raised at Lwt.Miscellaneous.poll in file "src/core/lwt.ml", line 3123, characters 20-29
Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 27, characters 10-20
Called from Lwt_main.run in file "src/unix/lwt_main.ml", line 106, characters 8-13
Re-raised at Lwt_main.run in file "src/unix/lwt_main.ml", line 112, characters 4-13
Called from Dune__exe__Main in file "bin/main.ml", line 28, characters 9-32
dinosaure commented 1 month ago

Not really sure to understand the reproducibility of your bug but I just proposed #645 (just as you thought) and it works. Can you test my PR on your side and see if everything is ok for you?

kopecs commented 1 month ago

Yes, I think that works for the deserialization direction. I had run into some issues adding a test due to the Encore failure mentioned in the second bit of the issue body.

To reproduce the code in the collapsable section should work. Is there an issue there?

hannesm commented 5 days ago

fixed in #645, thanks.