iris-hep / idap-200gbps-atlas

benchmarking throughput with PHYSLITE
6 stars 1 forks source link

Sensible placement of the `codegen` argument #125

Open gordonwatts opened 3 months ago

gordonwatts commented 3 months ago

Nominally, the codegen and the content of the query go hand-in-hand.

For example, here build_query builds the query as a func_adl. Another query could be built here and could require a different backend.

def build_query(name: str) -> Tuple[FuncADLQuery, str]:
    if name == "xaod_all":
        return (query_xaod_all(), "atlasr22")
    elif name == "xaod_medium":
        return (query_xaod_medium(), "atlasr22")
    elif name == "xaod_small":
        return (query_xaod_small(), "atlasr22")
    else:
        raise ValueError(f"Unknown query type {name}")

When passing this information to the SX front end for the query:

    spec = sx.ServiceXSpec(
        General=sx.General(
            ServiceX="atlasr22",
            Codegen=query[1],
            OutputFormat=sx.ResultFormat.root,  # type: ignore
            Delivery=("LocalCache" if download else "SignedURLs"),  # type: ignore
        ),
        Sample=[
            # TODO: Need a way to have the DID finder re-fetch the file list.
            sx.Sample(
                Name=f"speed_test_{ds_name}"[0:128],
                RucioDID=ds_name,
                Codegen=query[1],
                Query=query[0],
                NFiles=num_files,
                IgnoreLocalCache=ignore_cache,
            )  # type: ignore
            for ds_name in ds_names
        ],
    )

the fact you have to carry these two things along seperately isn't the test UX. Perhaps if all Querys impelment a common API, one could query for the codegen. No worries if codegen comes in via other routes, but this should be one of them.

ponyisi commented 1 month ago

Presumably fixed in the latest v3 frontend (where queries indicate their preferred codegens).