Open ponyisi opened 1 month ago
Rucio results are returned as a metadata file (there is a reason behind this). So, they are delivered in one big chunk.
Hi @ivukotic - we still query each individual dataset within a container separately, yes? So the metadata file is still on a per-dataset level, yes, not per-container, correct? So is there a reason we cannot yield
the result for each container in lookup_request:lookup_files
instead of concatenating them all together and then returning the full file list?
I don't really remember ... It could be that I handle each dataset inside data container separately. In that case it would have sense to yield them separately.
The Rucio DID finder will currently not return any files until the entire lookup (including of all file replicas) is complete. For the data18 PHYSLITE this is something like 7 minutes. However the DID finding infrastructure has the ability to handle partial results and there's no reason the Rucio DID finder couldn't return file replicas for large dataset containers on a dataset-by-dataset basis instead of waiting until the entire container is looked up. If we sort the dataset names this will even be reasonably reproducible between runs.