Closed matthewfeickert closed 4 years ago
Btw. a minimal working example is also just
>>> import uproot4
>>> uproot4.open("foo:bar.root")
The problem is this line https://github.com/scikit-hep/uproot4/blob/b8828069b9ae52c742fb704f07e1a4e179fe7b30/uproot4/_util.py#L157
This simply assumes that whenever there is a colon, the first part is the path and the second part is the object path, which of course fails when there is a colon in the filename.
EDIT:
I played around with the test suite and found for example the pattern file:/path/to/file : object path
as seen here https://github.com/scikit-hep/uproot4/blob/b8828069b9ae52c742fb704f07e1a4e179fe7b30/tests/test_0016-interpretations.py#L84
This would be another case to cover inside file_object_path_split()
. I am not sure if this pattern is used in the wild, but I think restricting it to file://
would probably be the better choice. But that's a minor issue.
Anyways, the file_object_path_split()
needs to be tweaked to cover all the cases when accepting :
in filenames and respect the optional object path.
If the object path is always specified with leading and trailing spaces (" : "
), then I think it's fairly easy.
Funnily this tiny change let's quite a few tests pass but it obviously fails when the object path is appended without leading and trailing spaces around the :
try:
index = path.rindex(" : ")
except ValueError:
return path, None
Unfortunately there is an ambiguity here, unless we assume that every root file ends with .root
.
I created a pull request (https://github.com/scikit-hep/uproot4/pull/80) which however comes with a grain of salt, so let's discuss further.
I was going to say that there's an option that simply turns off the colon-splitting, but I look now and I can't find it. I'm sure it was there. Anyway, that was the intention: that a boolean parameter, by default true, would specify whether the filename gets split into filesystem/URL path and in-ROOT party or not.
Also, the splitting character used to be a vertical bar |
because this is a very implausible character for a filename, given the amount of trouble it would cause in the shell. The reason I changed it was because it's not a natural separator—nothing about it suggests that this should be the dividing line between paths. By contrast, a colon is used this way (in shell variable lists, such as PATH
).
I switched to a colon because it's an illegal character for filenames in MacOS and (if I remember right) URLs. Unix filenames can have almost any character in them, but considering the use of colons in file lists like PATH
, that seemed implausible. It also seemed unlikely in Windows due to drive letter confusion.
So the colon had the right guessable meaning and didn't seem like it could be common in real file names. But ATLAS file names have colons in them? That's an important case. Maybe we should switch back to vertical bars?
At the very least, there should be a boolean parameter that just turns off the splitting. The uproot4.open
function is already documented, so in principle it's discoverable, and anyone who doesn't discover it can be told pretty quickly, "set split_path_with_bar=False
." I'm surprised it's not there because I coulda sworn I added that. It's a two line change (adding the parameter default and the if
statement).
On assuming ROOT files always end with ".root", that's a stricter retirement than not including colons. I'd rather not make that assumption.
Ah OK, with that additional option it's a bit better but still may cause confusion. I think switching to |
is better since sticking to :
and the current implementation means that at least in case of ATLAS files one could not load an object path via open()
. Or do you consider adding an additional object_path=
parameter?
Btw. :
in filenames work on macOS (it was an illegal character on class Mac OS and on the old carbon layer of Mac OS).
Another possibility is to require paths-in-file to start with a slash (because they are absolute, after all), and then the real separator would be ":/
", which works for ATLAS.
Or to only split with a colon (a convenience) if the filename ends with ".root
", while not requiring filenames to always end with ".root
". Sorry if I misunderstood that that was your actual suggestion.
It drop it entirely. This is an elaboration on the idea of deep getitem (in which you can write a single path that selects through the TDirectory and TBranch hierarchy), but maybe it's an overextension? Maybe it's not bad to require those extra quotes and braces. They were intended for quick navigation.
I've been thinking about it all morning. An object_path
parameter would defeat the purpose: it's easier to type
uproot4.open("filename.root")["pathname"]
than it is to type
uproot4.open("filename.root", object_path="pathname")
While the file_object["dir1/dir2/branch1/branch2"]
feature was requested, the uproot4.open("filename.root:dir1/dir2/branch1/branch2")
feature was not requested—it was an extrapolation of that request.
Since there isn't a compelling reason to do otherwise, we should do the simplest thing of not having any colon/vertical bar syntax at all, since that would be the most guessable and wouldn't require reference to the documentation before getting started. Uproot 4 is still in a fluid enough state (no documentation yet) that I can remove something like that.
I fully agree ;) I first thought that the column-syntax has also some special shortcuts to get to the data source even more efficiently but looking at the code it seems to me that it's just an alternative to open()[]
... but correct me if I am wrong!
That's right: it's only syntactic sugar. The same is true of file_object["dir1/dir2/branch1/branch2"]
, but that syntactic sugar was helpful because people were stumbling over file_object["dir1"]["dir2"]["branch1"]["branch2"]
. I guess it's okay if the number of square brackets is reduce to one, rather than being reduced to zero.
Now that I'm going in and deleting it, I find that it really is important: interfaces that take TTrees from multiple files need a way to express both succinctly.
lazyarray = uproot4.lazy(["batch1/*.root : treename1", "batch2/*.root : treename2"])
for arrays in uproot4.iterate(["batch1/*.root : treename1", "batch2/*.root : treename2"]):
...
Without this, the object path could be a separate parameter (as it was in Uproot3, which was a stumbling block because uproot.iterate("batch1/*.root", "treename1")
led people to forget the tree name) but then it can't be different for different files, as above. Allowing it to be different for different files would require 2-tuples or something, if there isn't a syntactic delimiter.
So it is more important than quick and easy typing in the uproot4.open
case... I have to think more about it... I'm open to suggestions.
Maybe it's staring me in the face: the colon could be outside of quotation marks if the file name specifier is allowed to be a dict:
lazyarray = uproot4.lazy([{"batch1/*.root": "treename1"}, {"batch2/*.root": "treename2"])
but for the common case of all files having the same name, requiring people to know this weird syntax should be unnecessary:
lazyarray = uproot4.lazy(["batch1/*.root", "batch2/*.root"], "treename")
or maybe pick the only TTree if the file has only one TTree (not counting cycle number).
Also allowing
lazyarray = uproot4.lazy([{"batch1/*.root": "treename1", "batch2/*.root": "treename2"])
The fact that uproot4.open
would accept dicts only if they have length 1 would be because uproot4.open
is a special case that only opens one file. I'm trying it out and will post a few unit tests here as a suggestion.
I really like the dict approach, I think that's quite transparent.
However, providing more logic is probably too much (like picking the only TTree if there is only one), I think that Python provides enough to make such things easily expressible...?
With a dict
approach one could also write a generator style
lazyarray = uproot4.lazy({fname: "treename"} for fname in ["batch1/*.root", "batch2/*.root])
or even simplifying it to a single dict
:
lazyarray = uproot4.lazy({fname: "treename" for fname in ["batch1/*.root", "batch2/*.root]})
but notice that this is just one dict
in contrast to your list
of dict
s example, so ordering is obviously not guaranteed for Python 3.5 and less, which might confuse one or the other. Anyways, that's just to throw in another point, the general gain is very low and the number of symbols to type is almost the same as above.
I don't think using the filename as a key in a dict is so great. What about instead a list of dicts with two specific keys: [{"file_path": fname, "object_path": "tree;1"}, ...]
I don't think using the filename as a key in a dict is so great. What about instead a list of dicts with two specific keys:
[{"file_path": fname, "object_path": "tree;1"}, ...]
But then you'd have to remember the spellings of those keys: "file_path"
and "object_path"
. They become part of the API that has to be remembered or looked up. If I'm in a LISPy mood, I'd be inclined to expect hyphens, rather than underscores, and maybe the whole second word is unnecessary. What you suggest would be great for a REST API, but it's pretty verbose for Python interaction.
Also, it prevents multiple files in one dict, since there can only be one "file_path"
key.
These are tests of the dict method: https://github.com/scikit-hep/uproot4/blob/96e99bc09ac45b1c306b54855e1b36060f23e096/tests/test_0081-dont-parse-colons.py
Examples look like this:
one_file = uproot4.open("filename.root")
one_tree = uproot4.open("filename.root")["treename"]
one_tree = uproot4.open({"filename.root": "treename"})
many_trees = uproot4.lazy("filenames*.root") # if all files have exactly one TTree
many_trees = uproot4.lazy({"filenames*.root": "treename"}) # if they don't
many_trees = uproot4.lazy(["dir1/*.root", "dir2/*.root"]) # if all files have exactly one TTree
many_trees = uproot4.lazy({"dir1/*.root: "treename", "dir2/*.root": treename}) # if they don't
many_trees = uproot4.lazy([{"dir1/*.root: "treename"}, {"dir2/*.root": treename}]) # careful about order
many_trees = uproot4.lazy([already_open_tree1, already_open_tree2, ...]) # objects like one_tree above
I don't like magical things like guessing which TTree you mean, but it has been requested a user. (Don't ask me to find the reference: it's hard to search for these things.) However, it only guesses which TTree when there is exactly one in the file (in all nested TDirectories), and if two kinds of files are accidentally mixed, it would only be successful if the requested TBranch names also overlap, which would be an even bigger coincidence than having different TTrees with the same name. ("Events
" is a very popular name.)
This reminds me that I was also asked (in Uproot 3, in the mists of time) to make iteration skip over any files that are missing the requested TTree. That's a bit more dangerous, but the motivation is to not have to weed out badly written files. (Not what I'd call a careful analysis, but maybe a first look at the data.) Rather than porting that feature into Uproot 4, I've made it optional: allow_missing=False
is the default, but if anyone has that problem again, I can point them to this option so they can opt into a more relaxed view.
Since there have been a few differing suggestions about what to do here, I'm not going to merge PR #81 immediately. I'll wait a day or two to let others voice their opinions, to see if there is a majority view. (@henryiii, do you have an idea?)
We don't have a formal voting system, but I'll take your suggestions into account (as I did with a lot of the names in Awkward1, where I personally preferred fewer underscores).
On a quick overview, I like the dict method that's just been merged. If one wanted to provide a colon based CLI interface, it would be easy to write:
# arg might be filename or filename:treename or filename:treename:
if len(x := arg.split(":",1)) > 1:
arg = {x[0]: x[1]}
(which obviously wouldn't work on filenames with colons, but that would not be inside uproot, either)
But ATLAS file names have colons in them? That's an important case.
Well, I guess I should be more careful with my words so that I don't get anyone blamed for anything. It isn't that ATLAS software explicitly has colons in filepaths. However, one of the most widely used analysis frameworks that people in ATLAS have developed on their own is xAODAnaHelpers
, and the following xAH_run.py
example command
xAH_run.py \
--config config/config.py \
--files data16_13TeV:data16_13TeV.periodA.physics_Main.PhysCont.DAOD_JETM1.grp16_v01_p4061 \
--inputRucio \
--nevents 10000 \
--force \
direct
results in an output file directory structure that includes (given the options in the config/config.py
) a file at the path of
sumbitDir/data-tree/data16_13TeV:data16_13TeV.periodA.physics_Main.PhysCont.DAOD_JETM1.grp16_v01_p4061.root
So it can/does happen (@kratsg please jump in and correct anything I've misrepresented).
I'll wait a day or two to let others voice their opinions, to see if there is a majority view.
Thanks. I don't have very strong API views at the moment (but I'll try to actually think on it now). cc the rest of pyhf
for this (@lukasheinrich, @kratsg) and also @alexander-held and @masonproffitt.
fwiw, ROOT
s TFile::Open
seems to be able to handle colon in the filename and in the paths after the fact pretty well. You could split this up into trying to be smarter about things automatically:
then loudly complain you can't find the file. At least you handle both cases this way... but I'm still not really sure how many people use the :/path/to/obj
functionality (it does exist in ROOT, but it doesn't seem like a common feature...)
I'm still not really sure how many people use the
:/path/to/obj
functionality (it does exist in ROOT, but it doesn't seem like a common feature...)
Really? I had no idea! I guess that strengthens the case that colon is a natural character to use—more easily guessable—if it was independently chosen. (Do they require the leading slash? That effectively makes the delimiter ":/
", which is less likely to conflict in real paths—you'd need a directory to end with a colon.)
About checking both with and without object path to see if the file exists, the remote protocols (HTTP and XRootD) would require an extra round trip for that. Also, what if (rare case, admittedly) both exist?
Earlier today, I wanted to remove the colon thing altogether because simple is better than clever. However, there are those functions that take many files, or more specifically, many file/tree combinations. The driving consideration is finding a syntax that simplifies the specification of many files/trees, and the single file/tree should be a special case of that. The dict thing works, but maybe it's strange looking or not guessable? If there are better ways to express many files/trees, let me know.
I find the existing colon functionality very convenient. One use case I have in mind are frameworks for histogram production. Users need to specify where all the inputs to their histograms are found. The existing syntax allows to do that in just one string, users could e.g. write
mc_samples/{SAMPLE_TYPE}.root:{SYSTEMATIC_VARIATION}
and specify a list of sample types and systematic variations to fill in the placeholders. If the paths inside and outside the ROOT file need to be specified separately, two such strings would be needed. Having it all in one string also reinforces the idea that for the structure of the data it does not matter much whether something is in a directory or a TDirectory
.
If one takes the above example but puts everything into a big root file, the string would look like
mc_samples.root:{SAMPLE_TYPE}/{SYSTEMATIC_VARIATION}
and it is quickly evident that the structure of content is the same. This is harder to parse when split up.
Since it came up above: I've had to deal with file structures before where folder names ended with .root
(and those folders contained ROOT files), but due to the name the folders mistakenly were interpreted as a ROOT file (this was with ROOT, not uproot
). Relying on this to parse might not always work.
Ok, it seems I've mispoken and what I'm remembering is a feature that hadd
supports but that TFile
and TFile::Open
does not:
hadd target_path.root:/pathtomerge in1.root in2.root
will only hadd-merge everything under that path. I tried googling to find an actual example, but alas, cannot... In any case, the TFile
documentation indicates that they still strongly recommend suffixing with .root
at the end of the file path... not sure why, but they rely on TUrl
to parse all file paths given.
Anyway, if it were me, I would want pathlib.Path(...)
to be treated as a file path without parsing for the colon to get around this, rather than treating it as a string (which the current behavior seems to be).
There have been quite a few comments about the colon syntax being useful; one on Slack. Maybe it needs to be reinstated. (The syntax issues with URLs and Windows drive letters have been resolved; what remains is @matthewfeickert's issue about it being an unpleasant surprise when you don't know about it.)
/
, and hence the delimiter is effectively :/
, rather than just :
?parse_colons=False
parameter.I can also add a detailed explanation of the syntax in all "file not found" error messages (local files, HTTP, etc.). If the colon parsing fails, the very likely error will be file not found, so that's when and how users can be informed. (Again, we don't assume everybody's read the documentation!)
By the way, unrelated, but I finally understand why vi my_broken_file.py:123
doesn't automatically try to open the file and go to line 123 (+123
does, FYI); it's a valid filename on some systems.
I like option 2, I think.
I like option 2, I think.
Which one is option 2?
Colon parsing by default, but not in the dict syntax. Assumes that colons in filenames are somewhat rare, which I expect they are?
Colon parsing by default, but not in the dict syntax. Assumes that colons in filenames are somewhat rare, which I expect they are?
rucio relies on scopes for filenames, and scopes are separated with colons: mc15_13TeV:{filename}
and when downloading, will include the scope as well. Dataset discovery relies on these scopes.
Repeating comment from Slack: I do use :/path/to/tree
fairly often (when the CMSSW TFileService is used to create a TTree, that TTree resides in a TDirectory).
Assumes that colons in filenames are somewhat rare, which I expect they are?
I made the same assumption, based on some web-research. I saw somewhere that colons aren't allowed at all in MacOS filenames, but @kratsg said that this restriction only applied to MacOS 9 and below, and possibly OSX in a compatibility mode.
(I remember MacOS 2‒9 used :
instead of /
as a delimiter between directories, so the character was not allowed for the same reason that /
is not allowed in filenames of modern filesystems. MacOS 1 didn't have directories.)
Well, I guess I should be more careful with my words so that I don't get anyone blamed for anything. It isn't that ATLAS software explicitly has colons in filepaths. However, one of the most widely used analysis frameworks that people in ATLAS have developed on their own is
xAODAnaHelpers
...
...which leads naturally to colons in filenames, and that's how this all came up. @kratsg just mentioned above that Rucio can put colons in filenames, so we'll probably start seeing it in CMS, too.
I'm leaning toward this "option 2" as well. I think the main open question for me is whether to require a /
after the :
, such that :/
is the true delimiter.
Probably the most important thing is that the precise rules are printed to the screen in every "file not found" error message, so that people can adjust after a first attempt.
Colon parsing by default, but not in the dict syntax. Assumes that colons in filenames are somewhat rare, which I expect they are?
rucio relies on scopes for filenames, and scopes are separated with colons:
mc15_13TeV:{filename}
and when downloading, will include the scope as well. Dataset discovery relies on these scopes.
Right, this is exactly what I'm familiar with in dealing with my own analysis files. I think it's important for a filename like foo:bar.root
to work in the standard string syntax. This is a very common case (for ATLAS at least). I like dropping the colon parsing entirely myself, but I think requiring :/
is probably still good enough.
Since the colon parsing is a convenience and gaps in many-file functions can be filled in with the dict syntax, how about if it only works in limited circumstances, such as after .root
? (@tamasgal originally made this suggestion.) So,
/path/to/filename.root:path/to/ttree
picks out an object at path/to/ttree
(requiring .root
makes requiring a leading /
in the object path superfluous, since it's already quite unambiguous, and it's a potential "gotcha")/path/to/file:name.root
would be interpreted as a file, not a "name.root
" object path within a file, since "file
" doesn't end in ".root
"/path/to/file:name.root:path/to/ttree
also picks out an object at path/to/ttree
from a file named file:name.root
; earlier colons are part of the filename{"/path/to/file.root:haha-this-is-the-filename.root": None}
won't parse the colon and won't pick out an object within the file because of the None
value.root
, or you're using wildcards that don't end in .root
(a likely example: /path/to/directory/*
). This is for functions that need to find many files, such as uproot4.iterate
, uproot4.concatenate
, and uproot4.lazy
.The colon parsing will be much simpler, based on the regex \.root\s*:\s*
, because I'll never need to worry about Windows drive letters or URL schemas with the \.root
disambiguator.
Thumbs-up this comment if you're in favor, write another comment below if you're not; I'm going to start coding up this solution and if we're in general agreement, I'll merge it.
There's still a problem with requiring .root:
. Another thing that I often see (again from rucio) is filenames ending with a number like .root.1
. For a random public example, see https://twiki.cern.ch/twiki/bin/view/Sandbox/GridNotes.
@jpivarski , I think you shouldn't change the parsing too much to try and support so much. What I think should happen is the following:
str
or bytes
passed in: be "smart" and parse out colons and figure out what the user wantspathlib.Path
-- assume this is openable via path.open()
instead of trying to convert to string and parsingfile pointer
(an already opened file) -- just load that up, no need to call open
etc...Is this not at all possible to do? Then if users really start having more complicated files that, for whatever reason, is borking the magic of uproot
, then can be more specific step by step, going from str
to Path
to a filepointer
object. The nice thing with that last option, I hope, is that one can use BytesIO
as input as well...
For files ending in .root.1
, etc., I would have thought you're out of the easy case and have to start using dicts. The error message would say that. Would that be a big imposition?
I hadn't considered having different behavior for str
/bytes
than for pathlib.Path
. Would that be more expected?
Also, on accepting an already open Python file-handle (loosely interpreted as anything with read
and seek
, like io.BytesIO
), Uproot has never had this capability, but it would be possible. It would exclude parallel reading of baskets, but you get what you pay for. That could become a new Source subclass, but it's beyond the scope of this issue.
Since there are no thumbs up on the previous proposal and two objections thereafter, how about the following:
.root
substring is found.str
or bytes
is passed, do the colon parsing as before (many people have said that they want it, though it can be a stumbling block, as in @matthewfeickert's case that started this thread).pathlib.Path
is passed, do not do colon parsing, which is a good way of enforcing what you want in uproot4.open
, but would be limiting in functions that need to open many file-tree combinations, such as uproot4.iterate
, uproot4.concatenate
, and uproot4.lazy
.uproot4.open
is a length-1 dict.What does everybody think of this? Thumbs up or comment: thanks!
Is there something wrong with using :/
as the delimiter? That seems like the easiest way to (I think) satisfy everyone.
In any case, I'm confused as to what the dict
syntax is when you just want to open a file with a colon in the filename. Is it:
uproot.open({'foo:bar.root': None})
?uproot.open({'foo:bar.root': ''})
?uproot.open({'foo:bar.root': '/'})
?or some/all of the above? Yikes. Actually, I guess it can't even be ''
, since that's a valid key. My problem here is that this suddenly makes something as simple as opening a file more confusing and results in inconsistent behavior between different filenames. It's very not beginner-friendly, as you now need to know about dictionaries and their syntax and a bit about TDirectory
before you can even look at anything in the ROOT file.
(I've actually been distracted by other work, so just getting back to this now.)
One difficulty with optimizing usability is that "what is natural" or "what a novice might try" is rather speculative. I would expect novices (including me, approaching a system I don't know) to get tripped up by the requirement that the object path has to be expressed as absolute, i.e. that it must start with :/blah
and not just :blah
, even though it is always going to be evaluated relative to /
. (This was confusing in the file URI scheme, for example.) I'm not of a strong opinion about this point, though, because I think the error message is where the wide array of users' guesses as to how it works will get narrowed down to how it actually does work—assuming users read error messages. (Sometimes a problem, but not a deep one if they copy-paste the error message into the bug report for me to read.)
The other difficulty with optimizing usability is that expectations among advanced users become necessities. I hadn't known (I guessed right) that the colon syntax was an expected feature, and so losing it would be a stumbling block for users who expect it from ROOT.
But the thing that's driving the dict syntax is neither of the above. It's driven by the functions that access many file-object (actually, file-TTree) pairs. They need a way—not a convenient way but any way—to express combined filesystem + ROOT object paths for hundreds or thousands of files. If we could have used the colon syntax for that, then that would have been great. But some filesystem paths have colons in them. In principle, a directory name could end with a colon, which would get incorrectly identified as :/
. So the many file-object functions need to have a backup syntax for when the string syntax isn't expressive enough, and that's what the dicts are. Only one other suggestion has been made that would be as unambiguous, but it's pretty verbose and introduces more things to remember.
So novice users do not need to know about the dict syntax; it only comes up if they can't express a group of filenames + object paths because the filenames have colons in them, and then it comes up as an error message because when they tried to do the naive thing, it couldn't find the file.
I'm going to implement the last proposal I made. I considered not porting the dict syntax from the functions that need it (uproot4.iterate
, uproot4.concatenate
, uproot4.lazy
) to the function that doesn't really need it (uproot4.open
, for which only a length-1 dict would be allowed) because the "file not found" error messages don't know whether they're from an attempt to open a single file or many files, and tracking that provenance through everything so that they can give context-dependent error messages doesn't seem worthwhile. Making the same syntax apply to all functions equally, even if it's not really needed for one of them, is simpler.
In any case, I'm confused as to what the
dict
syntax is when you just want to open a file with a colon in the filename. Is it:
uproot4.open({'foo:bar.root': None})
?uproot4.open({'foo:bar.root': ''})
?uproot4.open({'foo:bar.root': '/'})
?or some/all of the above?
I think the first and third would work; the second wouldn't for the reason you mentioned. But the preferred way to do it, following @kratsg's advice about pathlib.Path
, is
uproot4.open(pathlib.Path("foo:bar.root"))
No colon parsing would be performed on a pathlib.Path
, and since this is uproot4.open
, there's no reason to bring in any fancy dicts because you don't have to specify a TTree path. For uproot4.iterate
, uproot4.concatenate
, uproot4.lazy
, you do have to specify TTrees because you're iterating over, concatenating, or building lazy arrays from TTrees, not files. There wouldn't be a good reason to make the value in a dict None
, though it would work, for consistency's sake.
The error message should not only say what's possible, but should say what's preferred. Now I'm finally going to go and implement that.
In the end, the error messages look like this:
FileNotFoundError: file not found
'/really/long/path/that:in:the:end/is/not-a-file.root'
Files may be specified as:
* str/bytes: relative or absolute filesystem path or URL, without any colons
other than Windows drive letter or URL schema.
Examples: "rel/file.root", "C:\abs\file.root", "http://where/what.root"
* str/bytes: same with an object-within-ROOT path, separated by a colon.
Example: "rel/file.root:tdirectory/ttree"
* pathlib.Path: always interpreted as a filesystem path or URL only (no
object-within-ROOT path), regardless of whether there are any colons.
Examples: Path("rel:/file.root"), Path("/abs/path:stuff.root")
Functions that accept many files (uproot4.iterate, etc.) also allow:
* glob syntax in str/bytes and pathlib.Path.
Examples: Path("rel/*.root"), "/abs/*.root:tdirectory/ttree"
* dict: keys are filesystem paths, values are objects-within-ROOT paths.
Example: {"/data_v1/*.root": "ttree_v1", "/data_v2/*.root": "ttree_v2"}
* already-open TTree objects.
* iterables of the above.
Hi. I've come across an issue where if a ROOT file has a colon in the name of it
uproot3
can open the file butuproot4
fails. I can't give you the file, but I can show you a minimal failing example and then a reproducible example with public files.Minimal Failing Example
Failing Reproducible Example
Comments
I realize that this is probably because
uproot4
'sopen
'spath
ishttps://github.com/scikit-hep/uproot4/blob/b8828069b9ae52c742fb704f07e1a4e179fe7b30/uproot4/reading.py#L43-L46
and it isn't a good idea to have a file with a colon in it in general. However, there are ATLAS files that do, and it would be nice if
uproot3
behavior toward filenames could still be supported. If support for this is firmly out of scope it would be great if there could be some huge warning about this in the docs.