jenkinsci / generic-webhook-trigger-plugin

Can receive any HTTP request, extract any values from JSON or XML and trigger a job with those values available as variables. Works with GitHub, GitLab, Bitbucket, Jira and many more.
https://plugins.jenkins.io/generic-webhook-trigger
410 stars 161 forks source link

Remove unwanted json nodes from contributed variables #211

Closed ferpizza closed 1 year ago

ferpizza commented 3 years ago

Feature Request

Be able to remove some nodes from the payload before parsing, to avoid them from getting into the Contributed Variables.

Environment:

GitLab for source code + Jenkins for CI

Problem

After merging a Merge Request (MR) with several commits in it, GitLab sends a PUSH webhook with a payload containing details on the first 20 commits of the MR.

We usually capture the whole payload (with a JsonPath expression of $) as we use many variables contained in it during the pipeline execution. As each commit on the payload has at least 9 variables, 20 commits could contribute 180 or more variables to our environment.

After processing the payload and defining contributed variables the pipeline fails when trying to execute any command. As the number of environment variables is so high, passing them all to a shell session exceeds the maximum length allowed for a command, defined at the linux kernel level (ARG_MAX)

We couldn't find a way to achieve the removal of nodes from the payload by JsonPath, and after reading its documentation we believe there is no such method available.

As payloads contain a long list of nodes, being explicit about the ones we need is a complex task and prone to failures if any node name changes on a GitLab update (it has happened in the past). We would prefer to be able to specify those nodes we don't need instead.

Solution

As tweaking with the kernel to modify the value of ARG_MAX is not always possible, nor advisable, we would like to have the possibility of defining a JsonPath expression to be removed from the payload, before parsing it and turning the nodes into Contributed Variables.

It could work similarly to what Oracle has defined on the below documentation https://docs.oracle.com/cd/E39820_01/doc.11121/gateway_docs/content/conversion_remove_json_node.html

tomasbjerre commented 3 years ago

As payloads contain a long list of nodes, being explicit about the ones we need is a complex task and prone to failures if any node name changes on a GitLab update

You can reduce it a bit by specifying something like $.object_attributes.

If any node name changes, you pipeline code will fail, right? I dont see why it is more fragile to specify nodes with jsonpath.

ferpizza commented 3 years ago

I'm not sure if our case is an edge-case, or a more general approach at how to work with GWT. So I will illustrate here a bit more.

We have a single Job in Jenkins, listening to different triggers from GitLab. So, the same Job will be triggered by webhooks from MR, TAG, PUSH, COMMENT-on-MR, ISSUE, and COMMENT-on-ISSUE events.

Each webhook type carries a different payload, and json node-names for the data we use in the pipeline are not consistent across them.

So, if we want to make sure we capture all needed information on all types of webhooks, with the exception of "commits", we need to keep a list of all node-names of level 1 from all possible payloads.

The JsonPath expression that achieves this for us is $.["after","before","changes","checkout_sha","event_name","event_type","issue","labels","merge_request","message","object_attributes","object_kind","project","project_id","push_options","ref","repository","total_commits_count","user","user_avatar","user_email","user_id","user_name","user_username"]

This is the list we believe is prone to error and difficult to maintain, as we need to analyze all possible payloads in order to maintain it.

If we could express an exclusion to be applied to any incoming payload, instead of the above, we could use $.["comits"]

In both cases, JsonPath will only list/exclude the nodes in the expression IF they exist in the payload. If the nodes we want to include/exclude are not part of the payload, then JsonPath assumes there is nothing to do and continues (so, it does not fail and lists the nodes that it can, complying with the expression provided).

So, with a more clear picture in mind, what do you think about including an exclusion as suggested? Do you think that other users will benefit from such feature as well?

tomasbjerre commented 3 years ago

No I dont see the value. I still think it would be just as fragile as if you would specify exact nodes with the jsonpath.

There is also the complexity it would add and cost of maintenance.