ssl-hep / ServiceX

ServiceX - a data delivery service pilot for IRIS-HEP DOMA
BSD 3-Clause "New" or "Revised" License
20 stars 21 forks source link

Allow Rucio DID finder to emit partial results #871

Open ponyisi opened 1 month ago

ponyisi commented 1 month ago

The Rucio DID finder will currently not return any files until the entire lookup (including of all file replicas) is complete. For the data18 PHYSLITE this is something like 7 minutes. However the DID finding infrastructure has the ability to handle partial results and there's no reason the Rucio DID finder couldn't return file replicas for large dataset containers on a dataset-by-dataset basis instead of waiting until the entire container is looked up. If we sort the dataset names this will even be reasonably reproducible between runs.

ivukotic commented 1 month ago

Rucio results are returned as a metadata file (there is a reason behind this). So, they are delivered in one big chunk.

ponyisi commented 1 month ago

Hi @ivukotic - we still query each individual dataset within a container separately, yes? So the metadata file is still on a per-dataset level, yes, not per-container, correct? So is there a reason we cannot yield the result for each container in lookup_request:lookup_files instead of concatenating them all together and then returning the full file list?

ivukotic commented 1 month ago

I don't really remember ... It could be that I handle each dataset inside data container separately. In that case it would have sense to yield them separately.