microsoft / nutter

Testing framework for Databricks notebooks
MIT License
287 stars 42 forks source link

Adapt code so that it can also set access_list_permissions (which is something that is added in Jobs API 2.1) #77

Open YoshicoppensE61 opened 1 year ago

YoshicoppensE61 commented 1 year ago

Currently when you let the nutter run the tests, it starts jobs in databricks (using the 2.0 Jobs API). However, one of the big downsides of 2.0 vs 2.1 is that you cannot set permissions on your job in 2.0.

What will happen then often is that the nutter tests will run, they will show that an error has been made, but then only admins in databricks can actually see the results of the test in databricks (that is the default setting for jobs), others will see 'job not found'. Because we are generally running these tests in multiple environments, and not everyone can have admin access in all of those environments (we do not give admin access at all in production), this actually is a pretty big disadvantage.

If we could add access_list_permissions on the job as well, this would be mitigated. The underlying package databricks_api, which uses the underlying package databricks_cli, already allow for this.

nutter relevant code in common/apiclient.py

db = DatabricksAPI(host=config.host,
                           token=config.token
# NEW PROPOSED ADDITION -> add here jobs_api_version as an extra settings
)
        self.inner_dbclient = db

and

runid = self._retrier.execute(self.inner_dbclient.jobs.submit_run,
                                      run_name=name,
                                      existing_cluster_id=cluster_id,
                                      notebook_task=ntask,
# NEW PROPOSED ADDITION -> add access_list_permissions here
                                      )

databricks_api relevant code

class DatabricksAPI:
    def __init__(self, **kwargs):
        if "host" in kwargs:
            if not kwargs["host"].startswith("https://"):
                kwargs["host"] = "https://" + kwargs["host"]

        self.client = ApiClient(**kwargs)

databricks_cli sdk relevant code

class ApiClient(object):
    """
    A partial Python implementation of dbc rest api
    to be used by different versions of the client.
    """
    def __init__(self, user=None, password=None, host=None, token=None,
                 api_version=version.API_VERSION, default_headers={}, verify=True, command_name="", jobs_api_version=None):

And then nutter run needs to get an extra argument, these access_list_permissions, that are optional to be defined.

YoshicoppensE61 commented 1 year ago

Can I propose these code changes myself, and then someone can help a bit with the probable other code changes that need to happen because I probably do not grasp the full picture.

giventocode commented 1 year ago

Sorry for the delayed response, and thanks for opening the issue. Can you please create a PR with the proposed changes? I'd be happy to review and provide feedback. I'd expect this to be an option that bubbles up all the way to the CLI as an optional argument.