tmcgilchrist / ocaml-gitlab

Native OCaml bindings to Gitlab REST API v4
https://tmcgilchrist.github.io/ocaml-gitlab/gitlab/
BSD 3-Clause "New" or "Revised" License
27 stars 8 forks source link

Add Job_artifacts endpoints and commands #71

Open arvidj opened 1 year ago

arvidj commented 1 year ago

Adds the Job_artifacts module and implements three endpoints from https://docs.gitlab.com/ee/api/job_artifacts.html

Updated the lab tool with corresponding commands.

To do this, I had to expose the media_type argument of API.get.

tmcgilchrist commented 1 year ago

Thanks @arvidj it looks like you're still working on this so I'll just leave some questions here.

Exposing the media_type looks right, up till now there wasn't a reason to.

Do you plan on filling in the remaining features for Job Artifacts?

Have you tested the downloading cli using large artifacts? I didn't add a streaming response suitable for downloading binary artifacts so it might not be very efficient.

arvidj commented 1 year ago

Thanks @arvidj it looks like you're still working on this so I'll just leave some questions here.

Hi, it should be mostly finished but I haven't reviewed the code myself yet.

Exposing the media_type looks right, up till now there wasn't a reason to.

Do you plan on filling in the remaining features for Job Artifacts?

For the moment, I only needed those three endpoints so I exposed only them. I could do the rest if you prefer.

Have you tested the downloading cli using large artifacts? I didn't add a streaming response suitable for downloading binary artifacts so it might not be very efficient.

No, I haven't tried it on large artifacts. TBH, I have a very weak understanding of how to handle binary data efficiently, so I was happy to see that this "just worked" for the smaller files I tried. I'd be glad to do a more robust implementation but I would need some pointers to get started. I'm trying now to download the artifacts of:

https://gitlab.com/tezos/tezos/-/jobs/3330701870/artifacts/browse

A weird thing I'm noticing is that GitLab intermittently returns 401 although I have a token configured. And I verified that ocaml-gitlab sends along the headers. I'm writing it off as a API flakiness for now. Although it'd be nice to have better output in this case. A complication is that the GitLab is in XML in this case!

🤡 OCAMLRUNPARAM=b GITLAB_DEBUG=1 dune exec cli/main.exe -- artifact get --project-id 3836952 --job-id 3330701870
>>> GitLab: Requesting https://gitlab.com/api/v4/projects/3836952/jobs/3330701870/artifacts (headers: accept: application/octet-stream
Authorization: Bearer glpat-[OMITTED]
user-agent: ocaml-gitlab ocaml-cohttp/5.0.0

)
>>> GitLab: Response code 302 Found

>>> GitLab: Requesting https://cdn.artifacts.gitlab-static.net/da/18/da1836cc3c2c6f6645e6d2a8a03c4f819e8256dfce5953aee56bfede82db502c/2022_11_16/3330701870/3635947479/build-x86_64-yy-tezt-typechecking.zip?Expires=1668593099&KeyName=gprd-artifacts-cdn&Signature=zEpStiJpX2i3PZMwsZNg_HVsNq4= (headers: accept: application/octet-stream
Authorization: Bearer glpat-[OMITTED]
user-agent: ocaml-gitlab ocaml-cohttp/5.0.0

)
>>> GitLab: Response code 401 Unauthorized

>>> GitLab: response body:
<?xml version='1.0' encoding='UTF-8'?><Error><Code>AuthenticationRequired</Code><Message>Authentication required.</Message></Error>
lab: internal error, uncaught exception:
     Yojson.Json_error("Line 1, bytes 0-33:\nExpected '{' but found '<?xml version='1.0' encoding='UTF'")
     Raised at Yojson.json_error in file "common.ml", line 5, characters 19-39
     Called from Yojson.Safe.read_lcurl in file "lib/read.ml" (inlined), line 2319, characters 3-42
     Called from Gitlab_j.read_message in file "lib/gitlab_j.ml", line 40752, characters 4-31
     Called from Gitlab_core.Make.API.handle_response.(fun) in file "lib/gitlab_core.ml", line 714, characters 30-61
     Called from Lwt.Sequential_composition.bind.create_result_promise_and_callback_if_deferred.callback in file "src/core/lwt.ml", line 1860, characters 23-26
     Re-raised at Lwt.Miscellaneous.poll in file "src/core/lwt.ml", line 3068, characters 20-29
     Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 31, characters 10-20
     Called from Lwt_main.run in file "src/unix/lwt_main.ml", line 118, characters 8-13
     Re-raised at Lwt_main.run in file "src/unix/lwt_main.ml", line 124, characters 4-13
     Called from Cmdliner_term.app.(fun) in file "cmdliner_term.ml", line 24, characters 19-24
     Called from Cmdliner_eval.run_parser in file "cmdliner_eval.ml", line 34, characters 37-44

However, once the file downloaded, it was identical to the file downloaded by curl. With curl, the download took ~20 seconds, and with lab it took ~1 minute.

arvidj commented 1 year ago

@tmcgilchrist any hints on how to move forward on this? Do you think I need to:

add a streaming response suitable for downloading binary artifacts

? I see that Cohttp_lwt.Body exposes to_stream so the issue would be just threading that through the API module (and dealing with the semantic confusion of having two kinds of streams ;)).

arvidj commented 1 year ago

It doesn't seem too hard change the module API so that handler functions receive CLB.body instead of the body string directly. From there, we can choose to return a string Lwt.t to the user, so that Job_artifacts.get : string Lwt_stream.t option Response.t Monad.t (that signature is a mouthful). I read online though that Lwt_streams are frowned upon. Another option is returning the CLB.write_body function so that:

Job_artifacts.get : (string -> unit Lwt.t) option Response.t Monad.t

this way, the user is not exposed to any Lwt_stream.t and can write the each chunk as they desire.

I'm not familiar with Lwt_stream and the objections raised against it in https://github.com/ocsigen/lwt/issues/250 are academic to me. I'm fine either way. I'll make a draft returning a stream, it should be easy to change that part anyhow.

tmcgilchrist commented 1 year ago

Thanks for the link to the Lwt_streams flaws thread, interesting read. My expectation for how Lwt_streams should work matches the second model from https://github.com/ocsigen/lwt/issues/250#issuecomment-222146474 so it's interesting that it doesn't necessarily hold.

I would expect the Lwt_stream usage in this library to involve one writer and one reader when downloading/uploading artefacts, so the objections probably don't apply. I think following a model where you use the CLB.to/from_stream functions should work, though I haven't tried writing it myself.