lfd / PaStA

The Patch Stack Analysis
GNU General Public License v2.0
33 stars 20 forks source link

WIP: Add a Patchwork interface to PaStA #31

Closed metp closed 4 years ago

metp commented 5 years ago

WIP: Add setup instructions

closes #9

bulwahn commented 5 years ago

@rralf We are waiting for your review. I hope we can get this merged in the pasta master in the four weeks and then reach out on the patchwork mailing list to have some alpha users trying to use this feature. @Honeybyte Do you think we are ready in four weeks for the first alpha users?

bulwahn commented 5 years ago

I am wondering if @Honeybyte should provide an email address under which he is reachable more long-term than the current one provided. I think I would prefer that. Copyright is already clarified by the header statement.

metp commented 5 years ago

Looks good. We might actually consider to have this Dockerfile located and referenced somewhere more general than docker/patchwork/, as this is not only useful for the patchwork setup but for other setups as well.

You are right. The main difference between existing Dockerfiles and the one in the patchwork subdirectory is that the latter one is more suitable for a development environment as the source code lives on the host machine rather than in the container itself. I propose naming the directory 'development' instead.

metp commented 5 years ago

I am wondering if @Honeybyte should provide an email address under which he is reachable more long-term than the current one provided. I think I would prefer that. Copyright is already clarified by the header statement.

Yes sure, sounds reasonable. I will change the signed-off-by address.

metp commented 5 years ago

@Honeybyte Do you think we are ready in four weeks for the first alpha users?

We are ready now. The code on the Patchwork side is also finished (just some non source code things like edit docs, etc. is missing). Same on pasta side. Only setup instructions are missing.

metp commented 5 years ago

The great magic happening in "Add push cmd for pushing relations to patchwork" looks good to me. I would love to see that in action in a small productive patchwork instance :)

https://drive.google.com/file/d/1nkto3kIk1xK65MCgmt_rcMcYc3_Mp0Gi/view?usp=sharing

bulwahn commented 5 years ago

Nice video... @rralf is going to review this pull request next week. Once the patch set is provided on the mailing list and installation instructions are written, could we set up an own "shadow" lkml patchwork instance with the pasta integration running? E.g., obtain a "lkml mbox dump" from lore.kernel.org and set up patchwork with that data, then subscribe to lkml and let the patchwork instance follow the mailing list, have that patchwork instance connected to pasta and show the continuous patchwork-pasta integration running. That would make a great demo for ELCE...

rralf commented 5 years ago

Sorry for the delay... I'm about to review the patches -- is this the latest version? Thanks.

metp commented 5 years ago

Sorry for the delay... I'm about to review the patches -- is this the latest version? Thanks.

Yes

rralf commented 5 years ago

Hi Mete,

thanks for the patches!

I will need to test this series. You already created a dockerfile, though I don't understand what it is currently used for.

Can you extend this dockerfile to install pasta + a working patchwork instance? Without having a working patchwork / pasta instance, it's hard for me to test the changes.

Thanks Ralf

rralf commented 5 years ago

Commit "Move remove_identical_eq_classes to Clustering" looks like a reasonable refactoring change. This could be already merged independently before.

Ack, I will test that patch and consider that.

metp commented 5 years ago

Please use pasta-{base,linux} if possible and don't introduce dockerfiles from scratch.

I am not entirely happy with using pasta-base. As mentioned above, in comparison to pasta-base, the newly introduced Dockerfile does not copy anything into the container. Instead the source code can be mounted into a container allowing to skip rebuilding the image whenever the code changes. What do think about:

  1. Move all Python dependencies into a requirements.txt
  2. Use the new Dockerfile as pasta-base and install all python dependencies from requirements.txt
  3. Use the current pasta-base as a production Dockerfile (and the new Dockerfile as its base)
  4. Base pasta-linux on production
rralf commented 5 years ago

Fair enough. A three-stage setup would be fine for me. Something like base/pasta-base/pasta-linux.

Hmm. I never thought of mounting PaStA into the countainer. I'm curious: what does your workflow look like? (Nevertheless, for integration testing, we would still need to clone pasta in the container)

metp commented 5 years ago

Can we really rely on the latest message date? What if patchwork receives a deferred message from yesterday for some reason? What if a Mail has broken Date headers? Do we loose mails in this case?

I haven't thought about that, we could loose some mails. Fortunately Patchwork also exposes an events API which we can use instead.

metp commented 5 years ago

Hmm. I never thought of mounting PaStA into the countainer. I'm curious: what does your workflow look like?

start patchwork (and create patchwork_default network): $ docker-compose up

build container without sending any context: $ docker build -t pasta:latest - < docker/pasta-skeleton.dockerfile

start pasta in the same network and connect to the container: $ docker run -it --rm --network patchwork_default --name pasta -v <host/path/to/PaStA>:/home/pasta pasta:latest

Mounting the source code (with -v) into the container allows using the host source code in the container without needing to rebuild when changing code.

rsarky commented 4 years ago

Hey! What are the pending work items here? I am considering picking Patchwork integration for PaStA as my GSoC project

rralf commented 4 years ago

Just to name some examples:

rsarky commented 4 years ago

Thanks @rralf . This helps. although the final goal: integrate PaStA with an instance of Patchwork is well defined, this is slightly tricky as this is already WIP. I wouldn't want to duplicate any work or go over things that have already been looked at. @Honeybyte Can you help figuring out some independent work items?

For eg. Submission of results to patchwork sounds like a good work item although as far as I can see Mete has already looked into this.

metp commented 4 years ago

Hi @rsarky, happy to hear you want to contribute! I am currently not working anymore on combining pasta with patchwork so this task can definitely picked up by someone else. Patchwork recently integrated the required infrastructure for patch relations. The REST relations API in patchwork changed a lot so parts of this PR (including the submission of results routine) have to be updated.

rsarky commented 4 years ago

Thanks for getting back Mete! :smile: I will go over this PR once to gauge what needs to be picked up and maybe make a rough list of items. Will look into this first thing tomorrow morning.

rsarky commented 4 years ago

Going over the PR and the list mentioned by @rralf, I can identify the following broad action points:

@Honeybyte , @bulwahn , @rralf : Can you think of anything that I have missed or gotten wrong?

metp commented 4 years ago

Hi @rsarky, also have a look into https://github.com/lfd/PaStA-resources/pull/1. Beside that I cannot think of anything else. @rralf Can @rsarky push to the PRs when you assign him for them? Otherwise it's probably time now to close both PRs so that you can open your own ones (perhaps already a draft PR)

rralf commented 4 years ago

I doubt that someone can "push" to a PR. The basis of this PR is your private fork, and I think that it won't change anything if I assign this issue to Rohit, but let me try. With those fancy web interfaces you never know ;-)

rsarky commented 4 years ago

Hmm, not sure if I can check this PR out locally from here. I will probable need to set @Honeybyte 's fork as one of my upstreams in local and pull the branch from there.

rralf commented 4 years ago

Yes, that's definitely the case.

rsarky commented 4 years ago

Hey! Facing some issues building the pasta dockerfile. Specifically, when running docker build -t pasta:latest - < docker/pasta-skeleton.dockerfile . Here are the logs:

Sending build context to Docker daemon  3.072kB
Step 1/11 : FROM ubuntu:19.04
19.04: Pulling from library/ubuntu
4dc9c2fff018: Pull complete
0a4ccbb24215: Pull complete
c0f243bc6706: Pull complete
5ff1eaecba77: Pull complete
Digest: sha256:2adeae829bf27a3399a0e7db8ae38d5adb89bcaf1bbef378240bc0e6724e8344
Status: Downloaded newer image for ubuntu:19.04
 ---> c88ac1f841b7
Step 2/11 : MAINTAINER Ralf Ramsauer "ralf.ramsauer@oth-regensburg.de"
 ---> Running in 7c14ced68c17
Removing intermediate container 7c14ced68c17
 ---> 17892686dbb1
Step 3/11 : ENV DEBIAN_FRONTEND noninteractive
 ---> Running in 7666e0747c76
Removing intermediate container 7666e0747c76
 ---> bcfeed8568c5
Step 4/11 : ENV LANG="C.UTF-8"
 ---> Running in 412f9aeb523d
Removing intermediate container 412f9aeb523d
 ---> 3a881e8078ac
Step 5/11 : ENV LC_ALL="C.UTF-8"
 ---> Running in 1b1f3ea8e653
Removing intermediate container 1b1f3ea8e653
 ---> cde20e8830a3
Step 6/11 : RUN apt update && apt -y dist-upgrade
 ---> Running in 7e8c07d51383

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Ign:1 http://archive.ubuntu.com/ubuntu disco InRelease
Ign:2 http://security.ubuntu.com/ubuntu disco-security InRelease
Ign:3 http://archive.ubuntu.com/ubuntu disco-updates InRelease
Ign:4 http://archive.ubuntu.com/ubuntu disco-backports InRelease
Err:5 http://security.ubuntu.com/ubuntu disco-security Release
  404  Not Found [IP: 91.189.91.39 80]
Err:6 http://archive.ubuntu.com/ubuntu disco Release
  404  Not Found [IP: 91.189.88.142 80]
Err:7 http://archive.ubuntu.com/ubuntu disco-updates Release
  404  Not Found [IP: 91.189.88.142 80]
Err:8 http://archive.ubuntu.com/ubuntu disco-backports Release
  404  Not Found [IP: 91.189.88.142 80]
Reading package lists...
E: The repository 'http://security.ubuntu.com/ubuntu disco-security Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu disco Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu disco-updates Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu disco-backports Release' does not have a Release file.
The command '/bin/sh -c apt update && apt -y dist-upgrade' returned a non-zero code: 100

Wonder if you have seen this error before?

UPDATE: Fixed! see below

rsarky commented 4 years ago

UPDATE: Changing ubuntu verison from 19.04 to 18.04 solved the issue!

rsarky commented 4 years ago

Hmm. I never thought of mounting PaStA into the countainer. I'm curious: what does your workflow look like?

start patchwork (and create patchwork_default network): $ docker-compose up

build container without sending any context: $ docker build -t pasta:latest - < docker/pasta-skeleton.dockerfile

start pasta in the same network and connect to the container: $ docker run -it --rm --network patchwork_default --name pasta -v <host/path/to/PaStA>:/home/pasta pasta:latest

Mounting the source code (with -v) into the container allows using the host source code in the container without needing to rebuild when changing code.

Hey @Honeybyte , Any steps that you follow after this for testing? I am getting some weird issue where PaStA cannot parse the default.cfg

rralf commented 4 years ago

UPDATE: Changing ubuntu verison from 19.04 to 18.04 solved the issue!

... You had to downgrade your local ubuntu version to get it working?

rralf commented 4 years ago

I am getting some weird issue where PaStA cannot parse the default.cfg

I guess you will have to rebase this series to master/next. E.g., Rohit, you just changed the format of mbox-result -> patch-groups. This might explain the break...

rsarky commented 4 years ago

... You had to downgrade your local ubuntu version to get it working?

No no, just the one in the pasta-skeleton dockerfile

rsarky commented 4 years ago

I am getting some weird issue where PaStA cannot parse the default.cfg

I guess you will have to rebase this series to master/next. E.g., Rohit, you just changed the format of mbox-result -> patch-groups. This might explain the break...

Ah, that's possible, will try that

rralf commented 4 years ago

... You had to downgrade your local ubuntu version to get it working?

No no, just the one in the pasta-skeleton dockerfile

Huh, the current skeleton isn't working? Let me check that...

rralf commented 4 years ago

Yikes, you're right, let me check that.

rralf commented 4 years ago

Please find the fix here https://github.com/lfd/PaStA/commit/869f7eac58c8c94bd74b534b5066b7fce958d715

rsarky commented 4 years ago

I am getting some weird issue where PaStA cannot parse the default.cfg

I guess you will have to rebase this series to master/next. E.g., Rohit, you just changed the format of mbox-result -> patch-groups. This might explain the break...

Hmm, the weird thing is the same error is coming even when I checkout the master branch. Will try to debug this.

Apart from this another issue that I will face is since I have symlinked the linux repository in resources/linux/repo to somewhere else on my system it wont be available in the docker container. Wondering if there's any way other than just cloning the entire repository again there

rralf commented 4 years ago

Currently, I use docker only for integration testing. Other than that, you can modify / extend the docker configuration to mount your local directories.

rsarky commented 4 years ago

I am getting some weird issue where PaStA cannot parse the default.cfg

This was an error with the toml version. I was using version 0.9.x incorrectly, 0.10.x works as required

rsarky commented 4 years ago

Currently, I use docker only for integration testing. Other than that, you can modify / extend the docker configuration to mount your local directories.

Thanks! Successfully mounted Pasta and Patchwork to docker containers and am able to run Pasta's commands now in the container after mounting my linux source code directory too. Will work on their interaction now

rralf commented 4 years ago

I am getting some weird issue where PaStA cannot parse the default.cfg

This was an error with the toml version. I was using version 0.9.x incorrectly, 0.10.x works as required

Aah, good to know!

rralf commented 4 years ago

Currently, I use docker only for integration testing. Other than that, you can modify / extend the docker configuration to mount your local directories.

Thanks! Successfully mounted Pasta and Patchwork to docker containers and am able to run Pasta's commands now in the container after mounting my linux source code directory too. Will work on their interaction now

Did you write your own docker file for that? I'm willing to accept it, as this is clearly a better way for developing on PaStA.

rsarky commented 4 years ago

Currently, I use docker only for integration testing. Other than that, you can modify / extend the docker configuration to mount your local directories.

Thanks! Successfully mounted Pasta and Patchwork to docker containers and am able to run Pasta's commands now in the container after mounting my linux source code directory too. Will work on their interaction now

Did you write your own docker file for that? I'm willing to accept it, as this is clearly a better way for developing on PaStA.

Didnt need to write my own docker file. I use docker run -it --rm --network patchwork_default --name pasta -v <host/path/to/PaStA>:/home/pasta -v <host/path/to/linux-repo>:/home/linux pasta:latest

And then symlink the linux repo in resources/linux/repo to /home/linux/ or depending on where you have cloned the repository you can change the paths.

Wonder why the wrong toml version gets downloaded though. Whenever I bring the docker container up I have to upgrade the pip toml package to make PaStA work.

metp commented 4 years ago

Wonder why the wrong toml version gets downloaded though. Whenever I bring the docker container up I have to upgrade the pip toml package to make PaStA work.

Does it work with RUN apt update && apt install -y --no-install-recommends instead of just RUN apt install -y --no-install-recommends in the skeleton Dockerfile (without using docker build cache --no-cache)? The local package index is possibly outdated in the ubuntu image. Fetching an updated version of the ubuntu base image could work as well docker build --pull .

rsarky commented 4 years ago

Wonder why the wrong toml version gets downloaded though. Whenever I bring the docker container up I have to upgrade the pip toml package to make PaStA work.

Does it work with RUN apt update && apt install -y --no-install-recommends instead of just RUN apt install -y --no-install-recommends in the skeleton Dockerfile (without using docker build cache --no-cache)? The local package index is possibly outdated in the ubuntu image. Fetching an updated version of the ubuntu base image could work as well docker build --pull .

Ended up using 20.04 instead of 18.04 version for ubntu and the correct package version is fetched.

rralf commented 4 years ago

Yeah, I already pushed the upgrade to 20.04 to the next branch.

rsarky commented 4 years ago

Yeah, I already pushed the upgrade to 20.04 to the next branch.

Yup that's the one I am using now! Thanks

rsarky commented 4 years ago

Have been slightly busy this week. @Honeybyte Another thing I was wondering was, how do you feed mails into patchwork? Did you use a downloaded mailbox. I am assuming you alse made Patchwork parse incoming mails with an existing email address?

metp commented 4 years ago

Have been slightly busy this week. @Honeybyte Another thing I was wondering was, how do you feed mails into patchwork? Did you use a downloaded mailbox. I am assuming you alse made Patchwork parse incoming mails with an existing email address?

Hi @rsarky, "Official way": https://patchwork.readthedocs.io/en/latest/development/installation/#import-mailing-list-archives. However this only works for mailing lists archives that use pipermail (e.g those on https://lists.ozlabs.org/listinfo/). https://www.kernel.org/lore.html uses public inboxes (pubins). You can write a bash script in case you want to import those (don't have mine anymore 😞). Either clone your desired pubin and let the script iterate trough the git repository. Every commit in pubin relates to one mail. So you could possibly create your own mbox file and import it with https://patchwork.readthedocs.io/en/latest/deployment/management/#parsearchive. Or you could try to use pubin's export to mbox option. Unfortunately only single threads are downloadable.

rralf commented 4 years ago

The next question will then be how we can synchronise PaStA's patch database with Patchwork. Back then, we already had some discussions on that.

My standpoint is that we can do the analysis detached from patchwork. The translate Message-IDs to patchwork IDs and establish the connection. My reasons: Public Inboxes are a great exchange format, and they're already present. We don't need invasive changes of PaStA.

@bulwahn 's standpoint is that we should use official data from patchwork. This requires us to regularly update our mail database from patchwork, but has the advantage that we can directly work on patchwork ids and won't run out of sync.

Whatever we choose, we always have the problem that clusters may change over time... I don't know into which problems we will run on patchwork side.

bulwahn commented 4 years ago

Agree, with all said above. Here my summary:

There are two use cases needed and described:

  1. First Use Case (Development Use Case for Testing the Integration) To actually test the patchwork and pasta integration at scale, the developers need to set up a patchwork instance that simulates the actual use wrt. a large database that the users' patchwork instance would run with. Hence, we need quickly fill the patchwork database with emails. Mete explained how we use the existing archives to turn them into mbox format and then import them into our development&testing patchwork instance.

  2. Second Use Case (The Patchwork Instance Administrator's Use Case) We actually need to assume that the patchwork instance administrator has already set up their patchwork instance just as they wanted it to work, e.g., it could be following multiple public mailing lists (which might be archived or not), it could be just importing from a private mailbox etc. So, they are relunctant to change their patchwork instance import just to make use of pasta. Hence, I suggested to ensure that all emails are exported from the patchwork instance and handed over to pasta. Mete implemented that by having a script for the initial export of the existing mails in the instance, and then having another cronjob-suitable script for continuous export of new emails.

Ralf envisioned that patchwork would take public inbox repositories as sole input. I share that vision, but it requires significant changes in patchwork itself, and it requires much broader acceptance and adoption of public inbox (so it remains a long-term goal we should work towards), but it cannot be part of this first short-term solution.