Open kit-ty-kate opened 3 years ago
Hi @tmcgilchrist @kit-ty-kate
I am an Outreachy applicant. Can I be assigned this issue, please?
Hi @kit-ty-kate @tmcgilchrist Please am trying to run opam-repo-ci locally on my PC, I am having this challenge running the scheduler, I get ocluster-scheduler command not found, can you guide me please on how to resolve this?
Hi @Africanbae (I'm helping co-mentor the opam-repo-ci project for Outreachy).
I get ocluster-scheduler command not found, can you guide me please on how to resolve this?
I think the README in the OCluster repository assumes that you have actually installed the various command line tools such as ocluster-scheduler
. You can either make sure you have the packages installed or what is probably easier is to add dune exec --
in front of whatever command you are trying to run. For example you might run:
dune exec -- ocluster-scheduler --help
Hi @patricoferris as regards ocluster running locally on my pc, i have been trying the second stage https://github.com/ocurrent/opam-repo-ci/blob/7adacb8aa3fdb86c07b3c11544223bc2d28a9fa6/CONTRIBUTING.md#running-ocluster but I get ---panic: reflect: indirection through nil pointer to embedded struct [recovered] panic: reflect: indirection through nil pointer to embedded struct. Please can you guide me
@art-w maybe you know?
@Africanbae > Are you using minikube/kubernetes and trying to run the OCluster Dockerfiles? (it seems to be an error when docker is not started)
Sadly the OCluster Dockerfiles are not usable out of the box, so it would actually be easier to skip the docker stuff and run the ocluster directly, by following all the shell commands in the README.
Please how can I access the docker file, I am really confused because am using a fedora OS and i don't know how to start from number 1, can i be guided please?
I have been stuck on running a build service locally, I have the scheduler running on one terminal but when I submit a new job on another terminal, i get --- Missing scheme in (expected 'capnp://...') but before I was getting this result the tailing took a lot of time loading but i did not get any result at the end
Hi @Africanbae,
That sounds to me like the submission.cap
file generation did not work properly. Given that the command to run is:
ocluster-admin --connect ./capnp-secrets/admin.cap add-client test-user > submission.cap
If the first ocluster-admin
command failed it will not have output the capnp://...
address to stdout
and so the submission.cap
file will be empty. Hence when you try to submit a job, it does find the submission.cap
file but because it is empty you get the Missing scheme in (expected 'capnp://...')
error.
Do you know if this is what happened? Do you know what error the admin command might have had?
Hi @patricoferris thanks for the reply, I got Failed: Client "test-user" already registered! when I ran the first ocluster-admin but I noticed the submission.cap file was empty when I checked with the text-editor. What could be wrong? Is there something I didn't do right?
It sounds like maybe you ran the command multiple times. When the command failed with Failed: Client "test-user" already registered
it would have overwrote the submission.cap
file with nothing because of > submission.cap
. The easiest thing to do is probably:
dune exec -- ocluster-admin --connect ./capnp-secrets/admin.cap remove-client test-user
dune exex -- ocluster-admin --connect ./capnp-secrets/admin.cap add-client test-user > submission.cap
with the scheduler running. This will remove the old test-user client and let you regenerate the test-user client and hopefully you can check that the submission.cap
file is not empty. (You could also just use a different client name than test-user
too if you wanted).
@Africanbae > Can you try the following steps and report any potential issues? It should take care of all the ocluster issues you've been having :)
You'll need docker
and docker-compose
( https://docs.docker.com/compose/install/ ). The exact versions are not important, but these commands should not print any error:
$ docker --version
Docker version 20.10.12, build e91ed57
$ docker-compose --version
docker-compose version 1.29.2, build 5becea4c
In the following steps, I'm going to assume that we can work in the ~/ocaml
directory:
$ mkdir ~/ocaml && cd ~/ocaml
~/ocaml/ $ ls
Also please make sure that you don't have any ocluster still running in the background!
opam-repository
opam-repo-ci
normal behavior is to watch the GitHub ocaml/opam-repository
and trigger builds on every PR. For our development setup, we are going to be watching a local clone of the opam-repository
:
~/ocaml/ $ git clone 'https://github.com/ocaml/opam-repository.git'
~/ocaml/ $ cd opam-repository
There are no PRs locally though! We need to set the git HEAD
manually to indicate which branch we would like to test:
~/ocaml/opam-repository/ $ git fetch origin pull/21253/head:pr21253
~/ocaml/opam-repository/ $ git checkout pr21253
opam-repo-ci
Finally let's clone your opam-repo-ci
locally:
$ cd ~/ocaml
~/ocaml/ $ git clone --recurse-submodules 'https://github.com/Africanbae/opam-repo-ci.git'
~/ocaml/ $ cd opam-repo-ci
Note the --recurse-submodules
in the git clone! Without it your local copy will be incomplete and trigger errors.
We'll need to grab a small fix from my development repo to enable docker-compose
:
~/ocaml/opam-repo-ci/ $ git remote add art-w 'https://github.com/art-w/opam-repo-ci'
~/ocaml/opam-repo-ci/ $ git fetch art-w docker:docker
~/ocaml/opam-repo-ci/ $ git checkout docker
One last step: We need to tell opam-repo-ci
where to find our local clone of opam-repository
:
~/ocaml/opam-repo-ci/ $ echo 'OPAM_REPO=~/ocaml/opam-repository' > dev/conf.env
And we are done! Let's run it:
~/ocaml/opam-repo-ci/ $ make dev-start
After docker is done installing everything, you should be able to visit http://localhost:8090 for the opam-repo-ci website and http://localhost:8080 for the ocurrent pipeline.
opam-repo-ci
will automatically recompile and restart, with potential errors printed in the docker logsmake dev-stop
~/ocaml/opam-repository
and type git commit --amend --no-edit
=> This simulates a new PR that opam-repo-ci
will need to testPlease let us know if you encounter any errors!
Thanks @art-w as regards docker-compose --version i got docker-compose version 1.29.2, build unknown (hope it doesn't matter since it is not an error)
i got this when i ran make dev-start---
rm -Rf dev/capnp-secrets docker-compose \ --project-name=opam-repo-ci \ --file=./dev/docker-compose.yml \ --env-file=./dev/conf.env \ up \ --remove-orphans \ --build Traceback (most recent call last): File "/home/ope/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "/home/ope/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 398, in _make_request conn.request(method, url, **httplib_request_kw) File "/usr/lib64/python3.10/http/client.py", line 1282, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib64/python3.10/http/client.py", line 1328, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib64/python3.10/http/client.py", line 1277, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib64/python3.10/http/client.py", line 1037, in _send_output self.send(msg) File "/usr/lib64/python3.10/http/client.py", line 975, in send self.connect() File "/home/ope/.local/lib/python3.10/site-packages/docker/transport/unixconn.py", line 30, in connect sock.connect(self.unix_socket) FileNotFoundError: [Errno 2] No such file or directoryDuring handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/ope/.local/lib/python3.10/site-packages/requests/adapters.py", line 440, in send resp = conn.urlopen( File "/home/ope/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 785, in urlopen retries = retries.increment( File "/home/ope/.local/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment raise six.reraise(type(error), error, _stacktrace) File "/home/ope/.local/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise raise value.with_traceback(tb) File "/home/ope/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "/home/ope/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 398, in _make_request conn.request(method, url, **httplib_request_kw) File "/usr/lib64/python3.10/http/client.py", line 1282, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib64/python3.10/http/client.py", line 1328, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib64/python3.10/http/client.py", line 1277, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib64/python3.10/http/client.py", line 1037, in _send_output self.send(msg) File "/usr/lib64/python3.10/http/client.py", line 975, in send self.connect() File "/home/ope/.local/lib/python3.10/site-packages/docker/transport/unixconn.py", line 30, in connect sock.connect(self.unix_socket) urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/ope/.local/lib/python3.10/site-packages/docker/api/client.py", line 214, in _retrieve_server_version return self.version(api_version=False)["ApiVersion"] File "/home/ope/.local/lib/python3.10/site-packages/docker/api/daemon.py", line 181, in version return self._result(self._get(url), json=True) File "/home/ope/.local/lib/python3.10/site-packages/docker/utils/decorators.py", line 46, in inner return f(self, *args, kwargs) File "/home/ope/.local/lib/python3.10/site-packages/docker/api/client.py", line 237, in _get return self.get(url, self._set_request_timeout(kwargs)) File "/home/ope/.local/lib/python3.10/site-packages/requests/sessions.py", line 542, in get return self.request('GET', url, kwargs) File "/home/ope/.local/lib/python3.10/site-packages/requests/sessions.py", line 529, in request resp = self.send(prep, send_kwargs) File "/home/ope/.local/lib/python3.10/site-packages/requests/sessions.py", line 645, in send r = adapter.send(request, **kwargs) File "/home/ope/.local/lib/python3.10/site-packages/requests/adapters.py", line 501, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/ope/.local/bin/docker-compose", line 8, in <module> sys.exit(main()) File "/home/ope/.local/lib/python3.10/site-packages/compose/cli/main.py", line 81, in main command_func() File "/home/ope/.local/lib/python3.10/site-packages/compose/cli/main.py", line 200, in perform_command project = project_from_options('.', options) File "/home/ope/.local/lib/python3.10/site-packages/compose/cli/command.py", line 60, in project_from_options return get_project( File "/home/ope/.local/lib/python3.10/site-packages/compose/cli/command.py", line 152, in get_project client = get_client( File "/home/ope/.local/lib/python3.10/site-packages/compose/cli/docker_client.py", line 41, in get_client client = docker_client( File "/home/ope/.local/lib/python3.10/site-packages/compose/cli/docker_client.py", line 170, in docker_client client = APIClient(use_ssh_client=not use_paramiko_ssh, **kwargs) File "/home/ope/.local/lib/python3.10/site-packages/docker/api/client.py", line 197, in init self._version = self._retrieve_server_version() File "/home/ope/.local/lib/python3.10/site-packages/docker/api/client.py", line 221, in _retrieve_server_version raise DockerException( docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory')) make: *** [Makefile:12: dev-start] Error 1
I think your docker is not setup correctly. Can you use docker without sudo
or being root? If not you need to follow https://docs.docker.com/engine/install/linux-postinstall/
it seems its docker-compose that is not set up correctly
Can you share the output of :
$ docker info
$ docker info Client: Context: default Debug Mode: false Plugins: app: Docker App (Docker Inc., v0.9.1-beta3) buildx: Docker Buildx (Docker Inc., v0.8.1-docker) scan: Docker Scan (Docker Inc., v0.17.0) Server: ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? errors pretty printing info
Ok so docker
is not started -- can you try sudo systemctl restart docker
and docker info
again? (if it still doesn't work, then try and share the output of sudo docker info
?)
sudo systemctl restart docker [sudo] password for ope: Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xeu docker.service" for details.
$ systemctl status docker.service × docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; dis> Active: failed (Result: exit-code) since Thu 2022-04-21 08:> TriggeredBy: × docker.socket Docs: https://docs.docker.com Process: 8704 ExecStart=/usr/bin/dockerd -H fd:// --containe> Main PID: 8704 (code=exited, status=1/FAILURE) CPU: 132ms Apr 21 08:59:56 fedora systemd[1]: docker.service: Scheduled res> Apr 21 08:59:56 fedora systemd[1]: Stopped Docker Application Co> Apr 21 08:59:56 fedora systemd[1]: docker.service: Start request> Apr 21 08:59:56 fedora systemd[1]: docker.service: Failed with r> Apr 21 08:59:56 fedora systemd[1]: Failed to start Docker Applic> lines 1-14/14 (END)
Uh okay your docker is really broken... Can you reinstall it by following some online tutorials for fedora? (I'm afraid I don't have the knowledge to help you for this)
okay thanks @art-w am on working on it already
I have been able to resolve docker by reinstalling and removing it from trused zone
docker info Client: Context: default Debug Mode: false Server: Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 0 Server Version: 20.10.12 Storage Driver: btrfs Build Version: Btrfs v5.15.1 Library Version: 102 Logging Driver: journald Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc Default Runtime: runc Init Binary: /usr/libexec/docker/docker-init containerd version: 3df54a852345ae127d1fa3092b95168e4a88e2f8 runc version: v1.0.3-0-gf46b6ba init version: Security Options: seccomp Profile: default cgroupns Kernel Version: 5.16.9-200.fc35.x86_64 Operating System: Fedora Linux 35 (Workstation Edition) OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 7.625GiB Name: fedora ID: 4HSY:IMID:K6AR:XRT3:4HZE:5NXQ:ZG6R:CI6O:3UJY:QTGU:VOL3:EJRQ Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: true
While running make-dev start ----i get this has part of the output
pipeline_1 | 2022-04-21 17:39.24 opam_repo_ci.index [INFO] Index.record local/opam-repository 3ea724 (lint) -> - pipeline_1 | 2022-04-21 17:39.24 opam_repo_ci.index [INFO] Index.record local/opam-repository 3ea724 (analysis) -> - pipeline_1 | 2022-04-21 17:39.24 current [INFO] Result: Ok: () frontend_1 | Waiting for build... pipeline_1 | pipeline_1 | ** NEW BUILD (. changed) ** pipeline_1 | frontend_1 | Error: Rule failed to generate the following targets: frontend_1 | - .ppx/a4ea0d83c521725fd7b0167e93f2550b/ppx.exe frontend_1 | -> required by _build/default/.ppx/a4ea0d83c521725fd7b0167e93f2550b/ppx.exe frontend_1 | -> required by _build/default/web-ui/main.pp.ml frontend_1 | -> required by _build/default/web-ui/.main.eobjs/main.pp.ml.d frontend_1 | -> required by frontend_1 | _build/default/web-ui/.main.eobjs/duneexeMain.impl.all-deps frontend_1 | -> required by _build/default/web-ui/main.exe frontend_1 | Waiting for build... frontend_1 | File "api/raw.ml", line 1, characters 8-22: frontend_1 | 1 | include Schema.MakeRPC(Capnp_rpc_lwt) frontend_1 | ^^^^^^^^^^^^^^ frontend_1 | Error: The module Schema is an alias for module Opam_repo_ci_apiSchema, which is missing frontend_1 | File "api/client.ml", line 34, characters 52-71: frontend_1 | 34 | let request, params = Capability.Request.create Params.init_pointer in frontend_1 | ^^^^^^^^^^^^^^^^^^^ frontend_1 | Error: This expression has type frontend_1 | Opam_repo_ci_api.Raw.Builder.pointer_t -> Params.t frontend_1 | but an expression was expected of type frontend_1 | Capnp.MessageSig.rw Slice.t -> 'a frontend_1 | Type frontend_1 | Opam_repo_ci_api.Raw.Builder.pointer_t = frontend_1 | Opam_repo_ci_api.Schema.rw Slice.t frontend_1 | is not compatible with type Capnp.MessageSig.rw Slice.t frontend_1 | Type Opam_repo_ci_api.Schema.rw is not compatible with type frontend_1 | Capnp.MessageSig.rw frontend_1 | File "api/schema.ml", line 1: frontend_1 | Error: I/O error: api/.opam_repo_ci_api.objs/byte/opam_repo_ci_apiSchema.cmi: No such file or directory frontend_1 | Had errors, waiting for filesystem changes... frontend_1 | Waiting for build... frontend_1 | Waiting for build...It has been on
frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build... frontend_1 | Waiting for build...for the past one hour, does it take a long time?
Hi @art-w, thanks for yesterday. I am not able to reach http://localhost:8090/ for the opam-repo-ci website but this is opening http://localhost:8080/ for the ocurrent pipeline.
I think it's caused by the nested git submodules not being there. Can you try:
~/ocaml/opam-repo-ci $ git submodule update --init --recursive
(Thanks for the full log for the error! But no need to repeat Waiting for build...
, it's not a good sign when the system gets stuck printing that, the failure is printed above)
I have tried ~/ocaml/opam-repo-ci $ git submodule update --init --recursive but still didn't work
Thanks ok, I made some adjustments... Any luck with:
~/ocaml/opam-repo-ci/ $ git fetch art-w
~/ocaml/opam-repo-ci/ $ git rebase art-w/docker
Still site can't be reached or do I have to run any other thing after rebase?
Ha yes you need to re-run make dev-start
(perhaps first do make dev-stop
to make sure that docker ps
is fully empty)
ohhh...i guess so just wanted to be sure
After running make dev-start i get
pipeline_1 | 2022-04-22 13:11.04 application Wrote capability reference to "./dev/capnp-secrets/opam-repo-ci-admin.cap" frontend_1 | File "command line", line 1: frontend_1 | Error: Unbound module Current_rpc frontend_1 | Error: execve(.ppx/a4ea0d83c521725fd7b0167e93f2550b/ppx.exe): Text file busy frontend_1 | -> required by _build/default/web-ui/main.pp.ml frontend_1 | -> required by _build/default/web-ui/.main.eobjs/main.pp.ml.d frontend_1 | -> required by frontend_1 | _build/default/web-ui/.main.eobjs/duneexe__Main.impl.all-deps frontend_1 | -> required by _build/default/web-ui/main.exe frontend_1 | Had errors, waiting for filesystem changes... pipeline_1 | Success, waiting for filesystem changes... pipeline_1 | 2022-04-22 13:11.04 current_web [INFO] Starting web server: (TCP pipeline_1 | (Port 8080)) pipeline_1 | 2022-04-22 13:11.04 irw-inotify [INFO] Inotify mode pipeline_1 | 2022-04-22 13:11.05 irw-inotify [INFO] Inotify mode pipeline_1 | 2022-04-22 13:11.05 current [INFO] Result: Ok: ()
Is the opam-repo-ci website at http://localhost:8090 reachable? There might be some errors printed at the start but hopefully it should work out after a few seconds... You may have to re-run make dev-start
if you are really unlucky (or add an empty line and save in web-ui/homepage.ml
then check again http://localhost:8090 )
Thanks @art-w i can now reach http://localhost:8090/ . please as regards the issue am working on, which of the file will i be making changes on?
There are multiple ways to do it (and I'm unsure which @kit-ty-kate had in mind). Considering you're running low on time, you should look at:
https://github.com/ocurrent/opam-repo-ci/blob/master/web-ui/github.ml#L186
For each entry in the tree available via http://localhost:8090/github/local/opam-repository the variable k
is going to be a path in that tree corresponding to one build, for example k = ["opam-2.1"; "extras"; "x86_32"; "ocaml-lsp-server.1.11.2"; "tests"]
.
Those paths are collected and added to the StatusTree
that is finally displayed. It should be possible to reorder the paths k
to match the new hierarchy :) (with pattern matching on k
)
On top of my head, for this particular issue I was more thinking of a change in either service/pipeline.ml
or service/node.ml
. It is required to make two separate github statuses (one for opam-2.1
/lower-bounds
/architectures
/compiler-variants
and one for the revdeps
). Maybe even give each one their separate github statuses if you feel like it’s needed.
This could be done by changing the representation of Node.t
to split between all these categories.
Once done the web-ui should be able to do the changes automatically according to what Node.flatten
puts out.
Thanks @kit-ty-kate @art-w, Ocaml is a new language that I am just learning, as regards the task, i understand am meant to create two separate github statuses, please can i get a sample of how to go about making a github status? Here is what used but i am not sure am correct because i got an error when i tried it
~opam_version:V2_1 ~lower_bounds:false ~revdeps:false "domains"(Variant.v ~arch:
X86_64 ~distro:master_distro ~compiler:("4.12"))
I've seen some users complain about the web-ui design. In particular the build hierarchy isn't that clear. For instance in these following examples, it isn't clear for newcomers if it's the main build that failed or something else (in that case it's something else):
My proposal would be to change the hierarchy to be something like:
I would also split the revdeps into a separate github status entry to quickly understand if it's a core issue with the package or a revdeps issue. Maybe it would need its own status page but maybe adding it at the end in the same way lower-bounds/archtectures/... are done would be enough if the status is split.