02c: change of region causes problems.

Hi Mike, and thanks for this excellent series of tutorials.

I got as far as 02b when I started having resource availability issues in us-central1. After several days of not being able to open the notebook I decided to start again in a new region (europe-west6). I began with 00-setup and got as far as 02b without any issues. However, several pipeline tasks in 02c failed because if a region is not explicitly specified in a task the location defaults to us-central1. Hence the executors were looking for artifacts in the wrong region.

To cut a long story short, I only managed to complete the pipeline successfully by explicitly specifying a location in each task:

   # dataset
    dataset = TabularDatasetCreateOp(
        location = REGION,
        project = project,
        ....
    )

    # training
    model = AutoMLTabularTrainingJobRunOp(
        location = REGION,
        project = project,
        ...
    )

    # Endpoint: Creation
    endpoint = EndpointCreateOp(
        location = REGION,
        project = project,
        ...
    )

At first I tried explicitly setting the location in the top-level pipeline definition in the hope that this would cause the location to be inherited by the underlying tasks, but this didn't work. Perhaps there is another way of provoking this behavior....

As an aside, the solution above involved running the pipeline several times because I had to wait for each task to complete before I could verify that the next one was ok. This meant I trained the model (with identical data) three times, with a 2 hour wait each time. It was only later that I realized that Vertex pipelines have a cache which can be used to skip over repeated invocations of the same task, The reason the cache was not used was because you use a timestamp as part of the pipeline id. This is the equivalent of a "cache-buster" and excludes the cache by default. I would suggest using a less volatile pipeline id (eg a an explicit version number) and add a note to explain how the timestamp can be used to provoke a complete recalculation of the pipeline if necessary.

Thanks again for the work you have done here, it's super useful!

statmike / vertex-ai-mlops

02c: change of region causes problems. #54