Closed vkuznet closed 1 month ago
This are actually two different methods, but there seem to be a problem with the regular expression of the rest endpoint in combination with the CMS SCOPE_NAME regular expression, so in both cases, it calls list_replicas. https://github.com/rucio/rucio/blob/ec11eeb6618cfa6a63587d0bf78f195a4b77b632/lib/rucio/web/rest/webpy/v1/replica.py#L56
/([^/]*)(?=/)(.*)/datasets$
resolves to the list_dataset_replicas method
and /([^/]*)(?=/)(.*)/?$
to the list_replicas method
But the latter one seems to cover anything with .../datasets too, so it should probably be changed to: /([^/]*)(?=/)(.*)/$
So in conclusion: It should be two different methods, but in your case it is calling wrongly the same method.
Yet, another mis-behavior:
this call http://cms-rucio-test.cern.ch/replicas/cms/Charmonium/Run2017D-31Mar2018-v1/MINIAOD returns
{"states": {"T2_US_Nebraska": "AVAILABLE"}, "pfns": {"gsiftp://red-gridftp.unl.edu/user/uscms01/pnfs/unl.edu/data4/cms//store/data/Run2017D/Charmonium/MINIAOD/31Mar2018-v1/90000/FA607615-8E37-E811-B2DE-0CC47AC52C8E.root": {"domain": "wan", "rse": "T2_US_Nebraska", "priority": 1, "volatile": false, "client_extract": false, "type": "DISK"}}, "adler32": "881ec8cf", "name": "/store/data/Run2017D/Charmonium/MINIAOD/31Mar2018-v1/90000/FA607615-8E37-E811-B2DE-0CC47AC52C8E.root", "rses": {"T2_US_Nebraska": ["gsiftp://red-gridftp.unl.edu/user/uscms01/pnfs/unl.edu/data4/cms//store/data/Run2017D/Charmonium/MINIAOD/31Mar2018-v1/90000/FA607615-8E37-E811-B2DE-0CC47AC52C8E.root"]}, "scope": "cms", "bytes": 3890770828, "md5": null}
while this call http://cms-rucio-test.cern.ch/replicas/cms/Charmonium/Run2017D-31Mar2018-v1/MINIAOD/datasets returns nothing.
Is it the same regex issue? Please note that this time I used as a name a (CMS dataset) /a/b/c pattern rather then (CMS block) /a/b/c#123 one.
I think in this case it is correct; The first query (without /datasets) returns the file replicas. The second query (/datasets) returns dataset replicas, which essentially are just a more efficient way to get the replica status of all files of a dataset (in the database we create a dataset replica which has synchronised counters) - However, dataset replicas are not created in all workflows, thus in this case everything seems correct. I think you are right, in your previous query including the # this seems to be somehow problematic with the REST regex. I will have a closer look on this in the next days.
Motivation
I'm struggle to understand the difference between these two API calls:
In both cases the output seems identical, while the API description is different. The former list all replicas for data identifier, while the later is list dataset replicas. What's the difference?
Here is my calls (to CMS rucio server):
in both cases we have 283 returned documents.
Modification
Also, I want to get replicas only for a given name and RSE pair and so far I can't find API I need. The only way to do this is to get list of all replicas for a given name and filter the RSE I'm interested in. Would it be more efficient to pass RSE as a parameter to replicas API to fetch only subset of them on that given RSE? For example
/replicas/<scope>/<name>/<rse>