GPU usage with schema and merge orders

This is less of an issue and more of a question. In running an estimation with two schemas, SolvationFreeEnergy and HostGuestBindingAffinity here, I'm aware that HostGuestBindingAffinity is able to make full use of all available dask workers, while SolvationFreeEnergy doesn't make full use of a single worker. But what I was surprised to see was that when I ran my code with the host_guest_data_set merged into the freesolv_data_set and with the solvation_schema added to estimation_options before the host_guest_schema, the binding simulation was restricted to one GPU for the entire calculation.

I tried to fix this problem by switching two things: the line order of the schema additions and which data set is merged into the other. In swapping both of my original orders, the problem has completely gone away. The solvation and binding run simultaneously until the solvation is complete, at which point the binding calculation is able to utilize all four of my available workers.

So what I'm wondering is, is this part of normal operation? Do I need to make sure I load certain schemas first or merge certain datasets into their counterparts and not vice versa? In the future I plan on testing the job with just one of my two fixes implemented to see which one was actually responsible for correcting the GPU usage. Below is my code with relevant lines marked with asterisks.

    freesolv_data_set = PhysicalPropertyDataSet.from_pandas(molecule)

    host_guest_data_set = TaproomDataSet(
        #####
    )

*** freesolv_data_set.merge(host_guest_data_set)
    #FIXED VERSION:
    #host_guest_data_set.merge(freesolv_data_set)

    solvation_schema = SolvationFreeEnergy.default_simulation_schema(use_implicit_solvent=True)

    APR_settings = APRSimulationSteps(
        #####
    )
    host_guest_schema = HostGuestBindingAffinity.default_paprika_schema(
        simulation_settings=APR_settings,
        use_implicit_solvent=True,
        enable_hmr=False,
    )

    estimation_options = RequestOptions()
    estimation_options.calculation_layers = ["SimulationLayer"]
*** estimation_options.add_schema(
        "SimulationLayer", "SolvationFreeEnergy", solvation_schema
    )
*** estimation_options.add_schema(
        "SimulationLayer", "HostGuestBindingAffinity", host_guest_schema
    )
    #FIXED VERSION:
    #Swapped order of the two starred .add_schema methods to have host_guest_schema go first

    print("All schemas were added to estimation_options")

    # Create Pool of Dask Workers
    calculation_backend = DaskLocalCluster(
        number_of_workers=4,
        resources_per_worker=ComputeResources(
            number_of_threads=1,
            number_of_gpus=1,
            preferred_gpu_toolkit=ComputeResources.GPUToolkit.CUDA,
        ),
    )
    calculation_backend.start()

openforcefield / openff-evaluator

GPU usage with schema and merge orders #431