Open JohanWulff opened 2 years ago
This is a further elaboration of the problem, or a different but similar problem, or something. It's following up on Gitter (@johanwulff is the same author).
So the uproot version is 4.3.7 so it should include the latest bugfix which also explains why the error message is different now.
Since this issue is about something wrong in the way Uproot writes files, we'd need the file-writing step to be part of the reproducer. I just tried it in the latest Uproot (5.3.2 == main
) and there's no error:
>>> import uproot
>>> import numpy as np
>>> f = uproot.recreate("one.root")
>>> f["tree"] = {"branch": np.array([], dtype=np.float64)}
>>> g = uproot.recreate("two.root")
>>> g["tree"] = {"branch": np.array([32.5, 37.4, 34.1], dtype=np.float64)}
% hadd test.root one.root two.root
hadd Target file: test.root
hadd compression setting for all output: 1
hadd Source file 1: one.root
hadd Source file 2: two.root
hadd Target path: test.root:/
>>> import uproot
>>> uproot.open("test.root")["tree"].arrays().show(type=True)
type: 3 * {
branch: float64
}
[{branch: 32.5},
{branch: 37.4},
{branch: 34.1}]
Or... not? The file looks odd in ROOT 6.30/04:
>>> import ROOT
>>> f = ROOT.TFile("test.root")
>>> t = f.Get("tree")
>>> t.Scan()
************************
* Row * branch.br *
************************
* 0 * 0 *
* 1 * 37.4 *
* 2 * 34.1 *
************************
3
whereas the two files, individually, look okay:
>>> import ROOT
>>> f = ROOT.TFile("one.root")
>>> t = f.Get("tree")
>>> t.Scan()
************************
* Row * branch.br *
************************
************************
0
>>> import ROOT
>>> f = ROOT.TFile("two.root")
>>> t = f.Get("tree")
>>> t.Scan()
************************
* Row * branch.br *
************************
* 0 * 32.5 *
* 1 * 37.4 *
* 2 * 34.1 *
************************
3
This is not the original issue (a lot has changed since then; maybe something came along that fixed it), but it's a new one or a related one.
Digging a little deeper, we see the expected low-level data:
>>> import uproot
>>> branch = uproot.open("test.root")["tree"]["branch"]
>>> branch.num_baskets
2
>>> branch.basket(0).data
array([], dtype=uint8)
>>> branch.basket(1).data
array([ 64, 64, 64, 0, 0, 0, 0, 0, 64, 66, 179, 51, 51,
51, 51, 51, 64, 65, 12, 204, 204, 204, 204, 205], dtype=uint8)
>>> branch.basket(1).data.view(">f8")
array([32.5, 37.4, 34.1], dtype='>f8')
For this simple data type (double
per entry), the TBasket consists entirely of numeric data (after the TKey and basket header), no offsets or anything like that, so it should be a raw dump of the numbers, as we have here.
It seems to be something about empty TBaskets, since a similar example in which the first file is non-empty results in
Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import uproot
>>> import numpy as np
>>> f = uproot.recreate("uno.root")
>>> f["tree"] = {"branch": np.array([3.14], dtype=np.float64)}
>>> g = uproot.recreate("dos.root")
>>> g["tree"] = {"branch": np.array([32.5, 37.4, 34.1], dtype=np.float64)}
>>>
% hadd test2.root uno.root dos.root
hadd Target file: test2.root
hadd compression setting for all output: 1
hadd Source file 1: uno.root
hadd Source file 2: dos.root
hadd Target path: test2.root:/
% python
Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT
>>> f = ROOT.TFile("test2.root")
>>> t = f.Get("tree")
>>> t.Scan()
************************
* Row * branch.br *
************************
* 0 * 3.14 *
* 1 * 32.5 *
* 2 * 37.4 *
* 3 * 34.1 *
************************
4
So:
Should empty TBaskets be allowed? Maybe they're not and hadd is assuming that all input files have no empty TBaskets, and so is ROOT on read-back. But Uproot is considering empty TBaskets as just empty arrays to concatenate. That could be the cause of a mismatch in assumptions.
Is that the problem here? Should we not write data instead of writing an empty TBasket?
@JohanWulff, did you produce these files in a way that is different from how I made one.root
and two.root
?
Each of the provided two .root files can be opened perfectly fine on their own:
after
which generates the error
Error in <TBranch::AddBasket>: An out-of-order basket matches the entry number of an existing basket.
, the TTree 'tout' of the resulting file is faulty:FD1F1FC5-0A2F-6445-B49F-BE0DE70B41B9_MA.root.txt FDF4838A-7644-014B-B2CD-1B2747CC43C3_MA.root.txt