root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.7k stars 1.28k forks source link

Improve documentation/user experience when using globbing with remote paths #13258

Open vepadulano opened 1 year ago

vepadulano commented 1 year ago

Explain what you would like to see improved and how.

I found myself trying to use a glob in TChain::Add with a remote path. I discovered that in XRootD, this is supported:

TChain c{"Events"};
c.Add("root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run*");
c.GetListOfFiles()->GetEntries()
(int) 4

Whereas with https (through davix) this is not supported

TChain c{"Events"};
c.Add("https://root.cern/files/HiggsTauTauReduced/*.root");
Error in <TDavixSystem::DavixOpendir>: failed to opendir the directory: HTTP 405 : Method Not Allowed, Permission refused  (17)

First off, we should understand more clearly with the different libraries if this use case is 1. knowingly supported (xrootd), knowingly unsupported (davix) 2. a valid use case or seen as a corner case.

Base on that, we should decide whether to actually support it in ROOT, which would mean support it with all the different remote protocols, or remove support with xrootd and decide to always throw an exception in case remote globbing is attempted.

ROOT version

Any

Installation method

Any

Operating system

Any

Additional context

No response

eguiraud commented 1 year ago

we should decide whether to actually support it in ROOT, which would mean support it with all the different remote protocols, or remove support with xrootd and decide to always throw an exception in case remote globbing is attempted.

keeping the status quo is also an option

vepadulano commented 1 year ago

keeping the status quo is also an option

I was initially under the impression that this would lead to crashes when using RDF + remote globbing with HTTPS, but it seems like there is no crash.

>>> import ROOT
>>> df = ROOT.RDataFrame("Events", "https://root.cern/files/HiggsTauTauReduced/*.root")
Error in <TDavixSystem::DavixOpendir>: failed to opendir the directory: HTTP 405 : Method Not Allowed, Permission refused  (17)
>>> df.Sum("run").GetValue()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
cppyy.gbl.std.runtime_error: Template method resolution failed:
  ROOT::RDF::RResultPtr<double> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Sum(basic_string_view<char,char_traits<char> > columnName = "", double& initValue = RDFDetail::SumReturnType_t<RInferredType>{}) =>
    runtime_error: GetBranchNames: error in opening the tree Events
  ROOT::RDF::RResultPtr<double> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Sum(basic_string_view<char,char_traits<char> > columnName = "", double& initValue = RDFDetail::SumReturnType_t<RInferredType>{}) =>
    runtime_error: GetBranchNames: error in opening the tree Events

So yes, we could just leave things as they are. I think it's still worth to just ask the developers of the respective projects their opinion about this kind of feature. If it turns out that for xrootd this is not wanted, we can just disable it there too easily

eguiraud commented 1 year ago

this would lead to crashes when using RDF + remote globbing with HTTPS

with "status quo" I meant just leaving support for xrootd in (and not adding HTTPS+globbing support if that's not possible). if using HTTPS+globbing does not spit out an intelligible error, that should be fixed.

If it turns out that for xrootd this is not wanted, we can just disable it there too easily

that's a breaking change though

eguiraud commented 1 year ago

I think it's still worth to just ask the developers of the respective projects their opinion about this kind of feature

I totally agree with this