Open chrisburr opened 1 month ago
Dear @chrisburr ,
Thank you for reaching out and for the reproducer. I am on it. Meanwhile, I just wanted to point out that for the first case in 6.30, just calling ROOT.RDataFrame
will not attempt to open the file, whereas 6.32 opens the file at construction time ( to homogenise the way different data formats are processed). Just as a confirmation, could you try running any operation that would need to read data from the file in the first case with 6.30?
Thanks! This definitely used to be working (with 6.28 IIRC). If I find a minute I'll check with 6.30.
The problem is that RDF tries to open the file to check that it's valid. The logic for the file opening is at https://github.com/root-project/root/blob/962009b8c2057199c2229c3ef9938ac4d315d10a/tree/dataframe/src/RLoopManager.cxx#L1133 . In particular, because of the presence of the ?
token, the string is parsed as a glob. Now in many cases that would be harmless albeit a tiny overhead (it would just return the same file name to open), but in this particular case it triggers a faulty behaviour. The glob parsing attempts at traversing the remote xrootd directory (see here), but since the permission is just for the single file with the token and not for the entire directory, it leads to the user access restricted
error you post above.
Now, I believe the most sane course of action would be to refine the logic that checks whether the input file name is a glob. I could simply add a check for the xrd.wantprot
token, but probably we want to have a more authoritative list of all the tokens that should make the file name not be parsed as a glob. This probably includes not only xrootd tokens but also anything https-related. Or we could adopt a different strategy for glob detection altogether. Thoughts @dpiparo @pcanal ?
Ah that makes sense. Extending the defintion of strings to add metadata to paths (globbing, the #
syntax in TFile::Open
, ...) is always going to be error prone.
but probably we want to have a more authoritative list of all the tokens that should make the file name not be parsed as a glob
This feels like an impossible task to define.
Maybe a simplier solution would be to not support ?
when globbing and only apply globbing to the text before the query string? Or maybe just have a dedicated method (or argument type) for creating a RDataFrame from a glob rather than relying on huristics?
Check duplicate issues.
Description
EOS tokens no longer work with RDataFrame in 6.32.04. In 6.30.08 everything is fine:
Reproducer
On lxplus:
ROOT version
6.32.04
Installation method
sft.cern.ch
Operating system
Linux (lxplus)
Additional context
No response