root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.7k stars 1.28k forks source link

Unable to use EOS tokens with RDataFrame since 6.32 #16475

Open chrisburr opened 1 month ago

chrisburr commented 1 month ago

Check duplicate issues.

Description

EOS tokens no longer work with RDataFrame in 6.32.04. In 6.30.08 everything is fine:

$ python3
Python 3.9.18 (main, Aug 23 2024, 00:00:00)
[GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT
>>> url = 'root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root?xrd.wantprot=unix&authz=' + open("token.txt").read().strip()
>>> ROOT.TFile.Open(url).ls()
TNetXNGFile**       root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root   Demo ROOT file with histograms
 TNetXNGFile*       root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root   Demo ROOT file with histograms
  KEY: TH1F hpx;1   This is the px distribution
  KEY: TH2F hpxpy;1 py vs px
  KEY: TProfile hprof;1 Profile of pz versus px
  KEY: TNtuple  ntuple;1    Demo ntuple
>>> df = ROOT.RDataFrame("ntuple", url)
>>>

Reproducer

On lxplus:

$ source /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.32.04/x86_64-almalinux9.4-gcc114-opt/bin/thisroot.sh
$ cp /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.32.04/x86_64-almalinux9.4-gcc114-opt/tutorials/hsimple.root /eos/user/c/cburr/hsimple.root
$ EOS_MGM_URL=root://eoshome-c.cern.ch eos token --path /eos/user/c/cburr/hsimple.root --permission=rx --expires=$(date +%s -d "30 minutes") > token.txt
$ kdestroy
$ python3
Python 3.9.18 (main, Aug 23 2024, 00:00:00)
[GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT
>>> url = 'root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root?xrd.wantprot=unix&authz=' + open("token.txt").read().strip()
>>> ROOT.TFile.Open(url).ls()
TNetXNGFile**       root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root   Demo ROOT file with histograms
 TNetXNGFile*       root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root   Demo ROOT file with histograms
  KEY: TH1F hpx;1   This is the px distribution
  KEY: TH2F hpxpy;1 py vs px
  KEY: TProfile hprof;1 Profile of pz versus px
  KEY: TNtuple  ntuple;1    Demo ntuple
>>> df = ROOT.RDataFrame("ntuple", url)
Error in <TNetXNGSystem::GetDirEntry>: Unable to give access - user access restricted - unauthorized identity used ; Permission denied
 *** Break *** segmentation violation

ROOT version

6.32.04

Installation method

sft.cern.ch

Operating system

Linux (lxplus)

Additional context

No response

vepadulano commented 1 month ago

Dear @chrisburr ,

Thank you for reaching out and for the reproducer. I am on it. Meanwhile, I just wanted to point out that for the first case in 6.30, just calling ROOT.RDataFrame will not attempt to open the file, whereas 6.32 opens the file at construction time ( to homogenise the way different data formats are processed). Just as a confirmation, could you try running any operation that would need to read data from the file in the first case with 6.30?

chrisburr commented 1 month ago

Thanks! This definitely used to be working (with 6.28 IIRC). If I find a minute I'll check with 6.30.

vepadulano commented 1 month ago

The problem is that RDF tries to open the file to check that it's valid. The logic for the file opening is at https://github.com/root-project/root/blob/962009b8c2057199c2229c3ef9938ac4d315d10a/tree/dataframe/src/RLoopManager.cxx#L1133 . In particular, because of the presence of the ? token, the string is parsed as a glob. Now in many cases that would be harmless albeit a tiny overhead (it would just return the same file name to open), but in this particular case it triggers a faulty behaviour. The glob parsing attempts at traversing the remote xrootd directory (see here), but since the permission is just for the single file with the token and not for the entire directory, it leads to the user access restricted error you post above.

Now, I believe the most sane course of action would be to refine the logic that checks whether the input file name is a glob. I could simply add a check for the xrd.wantprot token, but probably we want to have a more authoritative list of all the tokens that should make the file name not be parsed as a glob. This probably includes not only xrootd tokens but also anything https-related. Or we could adopt a different strategy for glob detection altogether. Thoughts @dpiparo @pcanal ?

chrisburr commented 1 month ago

Ah that makes sense. Extending the defintion of strings to add metadata to paths (globbing, the # syntax in TFile::Open, ...) is always going to be error prone.

but probably we want to have a more authoritative list of all the tokens that should make the file name not be parsed as a glob

This feels like an impossible task to define.

Maybe a simplier solution would be to not support ? when globbing and only apply globbing to the text before the query string? Or maybe just have a dedicated method (or argument type) for creating a RDataFrame from a glob rather than relying on huristics?