rucio / rucio

Rucio - Scientific Data Management
http://rucio.cern.ch
Apache License 2.0
243 stars 308 forks source link

Replicas API ambiguity and improvements #1746

Open vkuznet opened 5 years ago

vkuznet commented 5 years ago

Motivation

I'm struggle to understand the difference between these two API calls:

In both cases the output seems identical, while the API description is different. The former list all replicas for data identifier, while the later is list dataset replicas. What's the difference?

Here is my calls (to CMS rucio server):

# this API /replicas/cms/Charmonium/Run2017D-31Mar2018-v1/MINIAOD#56f1f7ec-3789-11e8-a513-02163e018570 returns
{"states": {"T2_US_Nebraska": "AVAILABLE"}, "pfns": {}, "adler32": "881ec8cf", "name": "/store/data/Run2017D/Charmonium/MINIAOD/31Mar2018-v1/90000/FA607615-8E37-E811-B2DE-0CC47AC52C8E.root", "rses": {}, "scope": "cms", "bytes": 3890770828, "md5": null}

# this API /replicas/cms/Charmonium/Run2017D-31Mar2018-v1/MINIAOD#56f1f7ec-3789-11e8-a513-02163e018570/datasets returns
{"states": {"T2_US_Nebraska": "AVAILABLE"}, "pfns": {}, "adler32": "881ec8cf", "name": "/store/data/Run2017D/Charmonium/MINIAOD/31Mar2018-v1/90000/FA607615-8E37-E811-B2DE-0CC47AC52C8E.root", "rses": {}, "scope": "cms", "bytes": 3890770828, "md5": null}

in both cases we have 283 returned documents.

Modification

Also, I want to get replicas only for a given name and RSE pair and so far I can't find API I need. The only way to do this is to get list of all replicas for a given name and filter the RSE I'm interested in. Would it be more efficient to pass RSE as a parameter to replicas API to fetch only subset of them on that given RSE? For example /replicas/<scope>/<name>/<rse>

bari12 commented 5 years ago

This are actually two different methods, but there seem to be a problem with the regular expression of the rest endpoint in combination with the CMS SCOPE_NAME regular expression, so in both cases, it calls list_replicas. https://github.com/rucio/rucio/blob/ec11eeb6618cfa6a63587d0bf78f195a4b77b632/lib/rucio/web/rest/webpy/v1/replica.py#L56

/([^/]*)(?=/)(.*)/datasets$ resolves to the list_dataset_replicas method and /([^/]*)(?=/)(.*)/?$ to the list_replicas method But the latter one seems to cover anything with .../datasets too, so it should probably be changed to: /([^/]*)(?=/)(.*)/$

So in conclusion: It should be two different methods, but in your case it is calling wrongly the same method.

vkuznet commented 5 years ago

Yet, another mis-behavior:

this call http://cms-rucio-test.cern.ch/replicas/cms/Charmonium/Run2017D-31Mar2018-v1/MINIAOD returns

{"states": {"T2_US_Nebraska": "AVAILABLE"}, "pfns": {"gsiftp://red-gridftp.unl.edu/user/uscms01/pnfs/unl.edu/data4/cms//store/data/Run2017D/Charmonium/MINIAOD/31Mar2018-v1/90000/FA607615-8E37-E811-B2DE-0CC47AC52C8E.root": {"domain": "wan", "rse": "T2_US_Nebraska", "priority": 1, "volatile": false, "client_extract": false, "type": "DISK"}}, "adler32": "881ec8cf", "name": "/store/data/Run2017D/Charmonium/MINIAOD/31Mar2018-v1/90000/FA607615-8E37-E811-B2DE-0CC47AC52C8E.root", "rses": {"T2_US_Nebraska": ["gsiftp://red-gridftp.unl.edu/user/uscms01/pnfs/unl.edu/data4/cms//store/data/Run2017D/Charmonium/MINIAOD/31Mar2018-v1/90000/FA607615-8E37-E811-B2DE-0CC47AC52C8E.root"]}, "scope": "cms", "bytes": 3890770828, "md5": null}

while this call http://cms-rucio-test.cern.ch/replicas/cms/Charmonium/Run2017D-31Mar2018-v1/MINIAOD/datasets returns nothing.

Is it the same regex issue? Please note that this time I used as a name a (CMS dataset) /a/b/c pattern rather then (CMS block) /a/b/c#123 one.

bari12 commented 5 years ago

I think in this case it is correct; The first query (without /datasets) returns the file replicas. The second query (/datasets) returns dataset replicas, which essentially are just a more efficient way to get the replica status of all files of a dataset (in the database we create a dataset replica which has synchronised counters) - However, dataset replicas are not created in all workflows, thus in this case everything seems correct. I think you are right, in your previous query including the # this seems to be somehow problematic with the REST regex. I will have a closer look on this in the next days.