spotify / luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Apache License 2.0
17.85k stars 2.39k forks source link

Add support for custom authentication headers #3224

Open soapergem opened 1 year ago

soapergem commented 1 year ago

I would like to propose adding support for specifying custom HTTP headers to be sent to the Luigi scheduler. I am specifically thinking about customizing the Authorization header but I don't think there's any reason to limit things to that header only.

Zero Trust Networking is becoming a more common pattern and often involves proxying services through an Identity-Aware Proxy. So for instance, someone might front their deployment of the Luigi Scheduler with an IaP, meaning that when you try to access the scheduler web endpoint, you will first be redirected to an Identity Provider's (IdP) login form to complete an OAuth follow before being served the Luigi Scheduler web page. Practically speaking this means a special auth header is used to validate that you are authorized to access the scheduler. This is true for both the visualizer (which already works under this model) and for programmatic access, e.g. submitting jobs (which does not currently work behind an IaP). Without the appropriate header included, one wouldn't be able to submit any jobs, when configured with an IaP as described.

Thus, I'm envisioning submitting jobs with custom headers with the introduction of an --add-header argument, like this:

$ luigi --module top_artists AggregateArtists --scheduler-host luigi.my-domain.com --scheduler-port 8082 --add-header "Authorization: Bearer my-custom-header-value"

I suppose we would also want to come up with some contract for expressing multiple add_header options in a config file as well. I'm open to suggestions on how to name and format those.