Closed metp closed 4 years ago
@rralf We are waiting for your review. I hope we can get this merged in the pasta master in the four weeks and then reach out on the patchwork mailing list to have some alpha users trying to use this feature. @Honeybyte Do you think we are ready in four weeks for the first alpha users?
I am wondering if @Honeybyte should provide an email address under which he is reachable more long-term than the current one provided. I think I would prefer that. Copyright is already clarified by the header statement.
Looks good. We might actually consider to have this Dockerfile located and referenced somewhere more general than docker/patchwork/, as this is not only useful for the patchwork setup but for other setups as well.
You are right. The main difference between existing Dockerfiles and the one in the patchwork subdirectory is that the latter one is more suitable for a development environment as the source code lives on the host machine rather than in the container itself. I propose naming the directory 'development' instead.
I am wondering if @Honeybyte should provide an email address under which he is reachable more long-term than the current one provided. I think I would prefer that. Copyright is already clarified by the header statement.
Yes sure, sounds reasonable. I will change the signed-off-by address.
@Honeybyte Do you think we are ready in four weeks for the first alpha users?
We are ready now. The code on the Patchwork side is also finished (just some non source code things like edit docs, etc. is missing). Same on pasta side. Only setup instructions are missing.
The great magic happening in "Add push cmd for pushing relations to patchwork" looks good to me. I would love to see that in action in a small productive patchwork instance :)
https://drive.google.com/file/d/1nkto3kIk1xK65MCgmt_rcMcYc3_Mp0Gi/view?usp=sharing
Nice video... @rralf is going to review this pull request next week. Once the patch set is provided on the mailing list and installation instructions are written, could we set up an own "shadow" lkml patchwork instance with the pasta integration running? E.g., obtain a "lkml mbox dump" from lore.kernel.org and set up patchwork with that data, then subscribe to lkml and let the patchwork instance follow the mailing list, have that patchwork instance connected to pasta and show the continuous patchwork-pasta integration running. That would make a great demo for ELCE...
Sorry for the delay... I'm about to review the patches -- is this the latest version? Thanks.
Sorry for the delay... I'm about to review the patches -- is this the latest version? Thanks.
Yes
Hi Mete,
thanks for the patches!
I will need to test this series. You already created a dockerfile, though I don't understand what it is currently used for.
Can you extend this dockerfile to install pasta + a working patchwork instance? Without having a working patchwork / pasta instance, it's hard for me to test the changes.
Thanks Ralf
Commit "Move remove_identical_eq_classes to Clustering" looks like a reasonable refactoring change. This could be already merged independently before.
Ack, I will test that patch and consider that.
Please use pasta-{base,linux} if possible and don't introduce dockerfiles from scratch.
I am not entirely happy with using pasta-base. As mentioned above, in comparison to pasta-base, the newly introduced Dockerfile does not copy anything into the container. Instead the source code can be mounted into a container allowing to skip rebuilding the image whenever the code changes. What do think about:
Fair enough. A three-stage setup would be fine for me. Something like base/pasta-base/pasta-linux.
Hmm. I never thought of mounting PaStA into the countainer. I'm curious: what does your workflow look like? (Nevertheless, for integration testing, we would still need to clone pasta in the container)
Can we really rely on the latest message date? What if patchwork receives a deferred message from yesterday for some reason? What if a Mail has broken Date headers? Do we loose mails in this case?
I haven't thought about that, we could loose some mails. Fortunately Patchwork also exposes an events API which we can use instead.
Hmm. I never thought of mounting PaStA into the countainer. I'm curious: what does your workflow look like?
start patchwork (and create patchwork_default network):
$ docker-compose up
build container without sending any context:
$ docker build -t pasta:latest - < docker/pasta-skeleton.dockerfile
start pasta in the same network and connect to the container:
$ docker run -it --rm --network patchwork_default --name pasta -v <host/path/to/PaStA>:/home/pasta pasta:latest
Mounting the source code (with -v
) into the container allows using the host source code in the container without needing to rebuild when changing code.
Hey! What are the pending work items here? I am considering picking Patchwork integration for PaStA as my GSoC project
Just to name some examples:
Thanks @rralf . This helps. although the final goal: integrate PaStA with an instance of Patchwork is well defined, this is slightly tricky as this is already WIP. I wouldn't want to duplicate any work or go over things that have already been looked at. @Honeybyte Can you help figuring out some independent work items?
For eg. Submission of results to patchwork sounds like a good work item although as far as I can see Mete has already looked into this.
Hi @rsarky, happy to hear you want to contribute! I am currently not working anymore on combining pasta with patchwork so this task can definitely picked up by someone else. Patchwork recently integrated the required infrastructure for patch relations. The REST relations API in patchwork changed a lot so parts of this PR (including the submission of results routine) have to be updated.
Thanks for getting back Mete! :smile: I will go over this PR once to gauge what needs to be picked up and maybe make a rough list of items. Will look into this first thing tomorrow morning.
Going over the PR and the list mentioned by @rralf, I can identify the following broad action points:
@Honeybyte , @bulwahn , @rralf : Can you think of anything that I have missed or gotten wrong?
Hi @rsarky, also have a look into https://github.com/lfd/PaStA-resources/pull/1. Beside that I cannot think of anything else. @rralf Can @rsarky push to the PRs when you assign him for them? Otherwise it's probably time now to close both PRs so that you can open your own ones (perhaps already a draft PR)
I doubt that someone can "push" to a PR. The basis of this PR is your private fork, and I think that it won't change anything if I assign this issue to Rohit, but let me try. With those fancy web interfaces you never know ;-)
Hmm, not sure if I can check this PR out locally from here. I will probable need to set @Honeybyte 's fork as one of my upstreams in local and pull the branch from there.
Yes, that's definitely the case.
Hey! Facing some issues building the pasta dockerfile.
Specifically, when running docker build -t pasta:latest - < docker/pasta-skeleton.dockerfile
.
Here are the logs:
Sending build context to Docker daemon 3.072kB
Step 1/11 : FROM ubuntu:19.04
19.04: Pulling from library/ubuntu
4dc9c2fff018: Pull complete
0a4ccbb24215: Pull complete
c0f243bc6706: Pull complete
5ff1eaecba77: Pull complete
Digest: sha256:2adeae829bf27a3399a0e7db8ae38d5adb89bcaf1bbef378240bc0e6724e8344
Status: Downloaded newer image for ubuntu:19.04
---> c88ac1f841b7
Step 2/11 : MAINTAINER Ralf Ramsauer "ralf.ramsauer@oth-regensburg.de"
---> Running in 7c14ced68c17
Removing intermediate container 7c14ced68c17
---> 17892686dbb1
Step 3/11 : ENV DEBIAN_FRONTEND noninteractive
---> Running in 7666e0747c76
Removing intermediate container 7666e0747c76
---> bcfeed8568c5
Step 4/11 : ENV LANG="C.UTF-8"
---> Running in 412f9aeb523d
Removing intermediate container 412f9aeb523d
---> 3a881e8078ac
Step 5/11 : ENV LC_ALL="C.UTF-8"
---> Running in 1b1f3ea8e653
Removing intermediate container 1b1f3ea8e653
---> cde20e8830a3
Step 6/11 : RUN apt update && apt -y dist-upgrade
---> Running in 7e8c07d51383
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
Ign:1 http://archive.ubuntu.com/ubuntu disco InRelease
Ign:2 http://security.ubuntu.com/ubuntu disco-security InRelease
Ign:3 http://archive.ubuntu.com/ubuntu disco-updates InRelease
Ign:4 http://archive.ubuntu.com/ubuntu disco-backports InRelease
Err:5 http://security.ubuntu.com/ubuntu disco-security Release
404 Not Found [IP: 91.189.91.39 80]
Err:6 http://archive.ubuntu.com/ubuntu disco Release
404 Not Found [IP: 91.189.88.142 80]
Err:7 http://archive.ubuntu.com/ubuntu disco-updates Release
404 Not Found [IP: 91.189.88.142 80]
Err:8 http://archive.ubuntu.com/ubuntu disco-backports Release
404 Not Found [IP: 91.189.88.142 80]
Reading package lists...
E: The repository 'http://security.ubuntu.com/ubuntu disco-security Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu disco Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu disco-updates Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu disco-backports Release' does not have a Release file.
The command '/bin/sh -c apt update && apt -y dist-upgrade' returned a non-zero code: 100
Wonder if you have seen this error before?
UPDATE: Fixed! see below
UPDATE: Changing ubuntu verison from 19.04
to 18.04
solved the issue!
Hmm. I never thought of mounting PaStA into the countainer. I'm curious: what does your workflow look like?
start patchwork (and create patchwork_default network):
$ docker-compose up
build container without sending any context:
$ docker build -t pasta:latest - < docker/pasta-skeleton.dockerfile
start pasta in the same network and connect to the container:
$ docker run -it --rm --network patchwork_default --name pasta -v <host/path/to/PaStA>:/home/pasta pasta:latest
Mounting the source code (with
-v
) into the container allows using the host source code in the container without needing to rebuild when changing code.
Hey @Honeybyte ,
Any steps that you follow after this for testing?
I am getting some weird issue where PaStA cannot parse the default.cfg
UPDATE: Changing ubuntu verison from
19.04
to18.04
solved the issue!
... You had to downgrade your local ubuntu version to get it working?
I am getting some weird issue where PaStA cannot parse the
default.cfg
I guess you will have to rebase this series to master/next. E.g., Rohit, you just changed the format of mbox-result -> patch-groups. This might explain the break...
... You had to downgrade your local ubuntu version to get it working?
No no, just the one in the pasta-skeleton
dockerfile
I am getting some weird issue where PaStA cannot parse the
default.cfg
I guess you will have to rebase this series to master/next. E.g., Rohit, you just changed the format of mbox-result -> patch-groups. This might explain the break...
Ah, that's possible, will try that
... You had to downgrade your local ubuntu version to get it working?
No no, just the one in the
pasta-skeleton
dockerfile
Huh, the current skeleton isn't working? Let me check that...
Yikes, you're right, let me check that.
Please find the fix here https://github.com/lfd/PaStA/commit/869f7eac58c8c94bd74b534b5066b7fce958d715
I am getting some weird issue where PaStA cannot parse the
default.cfg
I guess you will have to rebase this series to master/next. E.g., Rohit, you just changed the format of mbox-result -> patch-groups. This might explain the break...
Hmm, the weird thing is the same error is coming even when I checkout the master branch. Will try to debug this.
Apart from this another issue that I will face is since I have symlinked the linux repository in resources/linux/repo to somewhere else on my system it wont be available in the docker container. Wondering if there's any way other than just cloning the entire repository again there
Currently, I use docker only for integration testing. Other than that, you can modify / extend the docker configuration to mount your local directories.
I am getting some weird issue where PaStA cannot parse the default.cfg
This was an error with the toml version. I was using version 0.9.x
incorrectly, 0.10.x
works as required
Currently, I use docker only for integration testing. Other than that, you can modify / extend the docker configuration to mount your local directories.
Thanks! Successfully mounted Pasta and Patchwork to docker containers and am able to run Pasta's commands now in the container after mounting my linux source code directory too. Will work on their interaction now
I am getting some weird issue where PaStA cannot parse the default.cfg
This was an error with the toml version. I was using version
0.9.x
incorrectly,0.10.x
works as required
Aah, good to know!
Currently, I use docker only for integration testing. Other than that, you can modify / extend the docker configuration to mount your local directories.
Thanks! Successfully mounted Pasta and Patchwork to docker containers and am able to run Pasta's commands now in the container after mounting my linux source code directory too. Will work on their interaction now
Did you write your own docker file for that? I'm willing to accept it, as this is clearly a better way for developing on PaStA.
Currently, I use docker only for integration testing. Other than that, you can modify / extend the docker configuration to mount your local directories.
Thanks! Successfully mounted Pasta and Patchwork to docker containers and am able to run Pasta's commands now in the container after mounting my linux source code directory too. Will work on their interaction now
Did you write your own docker file for that? I'm willing to accept it, as this is clearly a better way for developing on PaStA.
Didnt need to write my own docker file.
I use docker run -it --rm --network patchwork_default --name pasta -v <host/path/to/PaStA>:/home/pasta -v <host/path/to/linux-repo>:/home/linux pasta:latest
And then symlink the linux repo in resources/linux/repo
to /home/linux/
or depending on where you have cloned the repository you can change the paths.
Wonder why the wrong toml version gets downloaded though. Whenever I bring the docker container up I have to upgrade the pip toml package to make PaStA work.
Wonder why the wrong toml version gets downloaded though. Whenever I bring the docker container up I have to upgrade the pip toml package to make PaStA work.
Does it work with RUN apt update && apt install -y --no-install-recommends
instead of just RUN apt install -y --no-install-recommends
in the skeleton Dockerfile (without using docker build cache --no-cache
)? The local package index is possibly outdated in the ubuntu image. Fetching an updated version of the ubuntu base image could work as well docker build --pull .
Wonder why the wrong toml version gets downloaded though. Whenever I bring the docker container up I have to upgrade the pip toml package to make PaStA work.
Does it work with
RUN apt update && apt install -y --no-install-recommends
instead of justRUN apt install -y --no-install-recommends
in the skeleton Dockerfile (without using docker build cache--no-cache
)? The local package index is possibly outdated in the ubuntu image. Fetching an updated version of the ubuntu base image could work as welldocker build --pull .
Ended up using 20.04
instead of 18.04
version for ubntu and the correct package version is fetched.
Yeah, I already pushed the upgrade to 20.04 to the next branch.
Yeah, I already pushed the upgrade to 20.04 to the next branch.
Yup that's the one I am using now! Thanks
Have been slightly busy this week. @Honeybyte Another thing I was wondering was, how do you feed mails into patchwork? Did you use a downloaded mailbox. I am assuming you alse made Patchwork parse incoming mails with an existing email address?
Have been slightly busy this week. @Honeybyte Another thing I was wondering was, how do you feed mails into patchwork? Did you use a downloaded mailbox. I am assuming you alse made Patchwork parse incoming mails with an existing email address?
Hi @rsarky, "Official way": https://patchwork.readthedocs.io/en/latest/development/installation/#import-mailing-list-archives. However this only works for mailing lists archives that use pipermail (e.g those on https://lists.ozlabs.org/listinfo/). https://www.kernel.org/lore.html uses public inboxes (pubins). You can write a bash script in case you want to import those (don't have mine anymore 😞). Either clone your desired pubin and let the script iterate trough the git repository. Every commit in pubin relates to one mail. So you could possibly create your own mbox file and import it with https://patchwork.readthedocs.io/en/latest/deployment/management/#parsearchive. Or you could try to use pubin's export to mbox option. Unfortunately only single threads are downloadable.
The next question will then be how we can synchronise PaStA's patch database with Patchwork. Back then, we already had some discussions on that.
My standpoint is that we can do the analysis detached from patchwork. The translate Message-IDs to patchwork IDs and establish the connection. My reasons: Public Inboxes are a great exchange format, and they're already present. We don't need invasive changes of PaStA.
@bulwahn 's standpoint is that we should use official data from patchwork. This requires us to regularly update our mail database from patchwork, but has the advantage that we can directly work on patchwork ids and won't run out of sync.
Whatever we choose, we always have the problem that clusters may change over time... I don't know into which problems we will run on patchwork side.
Agree, with all said above. Here my summary:
There are two use cases needed and described:
First Use Case (Development Use Case for Testing the Integration) To actually test the patchwork and pasta integration at scale, the developers need to set up a patchwork instance that simulates the actual use wrt. a large database that the users' patchwork instance would run with. Hence, we need quickly fill the patchwork database with emails. Mete explained how we use the existing archives to turn them into mbox format and then import them into our development&testing patchwork instance.
Second Use Case (The Patchwork Instance Administrator's Use Case) We actually need to assume that the patchwork instance administrator has already set up their patchwork instance just as they wanted it to work, e.g., it could be following multiple public mailing lists (which might be archived or not), it could be just importing from a private mailbox etc. So, they are relunctant to change their patchwork instance import just to make use of pasta. Hence, I suggested to ensure that all emails are exported from the patchwork instance and handed over to pasta. Mete implemented that by having a script for the initial export of the existing mails in the instance, and then having another cronjob-suitable script for continuous export of new emails.
Ralf envisioned that patchwork would take public inbox repositories as sole input. I share that vision, but it requires significant changes in patchwork itself, and it requires much broader acceptance and adoption of public inbox (so it remains a long-term goal we should work towards), but it cannot be part of this first short-term solution.
WIP: Add setup instructions
closes #9