scikit-hep / uproot5

ROOT I/O in pure Python and NumPy.
https://uproot.readthedocs.io
BSD 3-Clause "New" or "Revised" License
233 stars 73 forks source link

Recovering or signalling improperly closed files #531

Open tamasgal opened 2 years ago

tamasgal commented 2 years ago

I was wondering if there is a way to figure out that a ROOT file was not closed properly, or even some ways to recover the data inside. This was already discussed before, e.g. here: https://github.com/scikit-hep/uproot3/issues/472 (funnily the issue was raised by a member of our collaboration, apparently we have some issues with improperly closed files ;)) and of course, data recovery might be very painful (regex search for TKey etc.) and probably not worth the effort to implement it in uproot but I thought I revive this quickly with an example file, which actually is readable with our ROOT-based frameworks.

I think it would be helpful if at least some kind of an error was shown that the file is in a dangerous (non-closed) state, instead of silently opening it and not listing any keys.

http://131.188.161.12:30002/not_properly_closed_file.root

$ root.exe not_properly_closed_file.root
   ------------------------------------------------------------------
  | Welcome to ROOT 6.22/06                        https://root.cern |
  | (c) 1995-2020, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Mar 01 2021, 11:20:55                 |
  | From tags/v6-22-06@v6-22-06                                      |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q'       |
   ------------------------------------------------------------------

root [0]
Attaching file not_properly_closed_file.root as _file0...
Warning in <TFile::Init>: file not_properly_closed_file.root probably not closed, trying to recover
Info in <TFile::Recover>: not_properly_closed_file.root, recovered key TTree:E at address 59282385
Warning in <TFile::Init>: successfully recovered 1 keys

root [1] _file0->TestBit(TFile::kRecovered)
(bool) true

As seen above, the ...->TestBit(TFile::kRecovered) method can be used to check if the file was recovered (the bit kRecovered is defined here:

https://github.com/root-project/root/blob/47f66c57ca0a657942a9075030f89d9da885b83a/io/io/inc/TFile.h#L183)

and the corresponding Recover method here:

https://github.com/root-project/root/blob/47f66c57ca0a657942a9075030f89d9da885b83a/io/io/src/TFile.cxx#L1939

uproot v4.1.9 is not crashing but also does not seem to get any data out of it, nor does it report the erroneous state:

>>> import uproot

>>> uproot.__version__
'4.1.9'

>>> f = uproot.open("/sps/km3net/users/heijboer/issue_6/v6.3/mcv6.3.gsg_numu-CCHEDIS_1e2-1e8GeV.1.root")

>>> f.keys()
[]

>>> f.classnames()
{}

>>> f._fSeekKeys
0

>>> f._keys
[]

>>> f._fSeekParent
0

Btw. with UnROOT.jl we get a runaway cursor ;)

julia> using UnROOT

julia> f = UnROOT.ROOTFile("not_properly_closed_file.root")
ERROR: EOFError: read end of file
jpivarski commented 2 years ago

I can't see the file (IP address isn't public—not refused; it's probably an internal network).

But anyway, TestBit checks the fBits of a TObject. In ROOT, a TFile is a TObject, but I don't know how the bytes on disk becomes a TObject (it's not the way contained objects become TObjects, such as TH1F, for instance). You've probably seen that it's a ReadOnlyFile, one of the three classes that isn't generically mappable to the corresponding ROOT type (along with ReadOnlyKey and ReadOnlyDirectory, because they're too fundamental).

But anyway, I think that the kRecover bit is only set in ROOT's Recover function, so the Recover has to be invoked before you know that a file needs recovering. The general strategy that Uproot takes is to be very hands-off: there are a lot of internal consistency conditions in the ROOT format that it does not check. Under normal circumstances, it doesn't even read the streamers (to reduce file-opening time).

Doing a Recover, or even just verifying that one would be fruitful, isn't as bad as a "grep" for TKeys, though I can't think of a byte string that could be grepped for, anyway. It would be a walk over the data that isn't listed in the TFree list of unused data. Regions of the file that aren't labeled by TFree as being garbage need to be back-to-back TKey-object pairs, where each TKey specifies the size of the object (compressed), and therefore where to jump to next. Regardless of language (Python, C++, Julia), that would be an expensive thing to check on remote files: it's a long series of data round-trips—you only know where to seek next based on the values you see in the current TKey. A big file would have a lot of TKeys.

This function would definitely have to be opt-in, and you only know that something's wrong if it has run.

ROOT maintains a list of TKeys in a file independent of the objects that need it (such as TTrees pointing to TBaskets, or TDirectories pointing to the data they contain). I don't know what that second, unstructured list is good for. This recovery function can tell us that something's wrong—a TKey-object pair exists with no path to it—but how does that help if there's no path to it? How does a user access that data?

Nevertheless, checking consistency is one positive benefit. The function would be named something like, "Is this file okay?"

When UnROOT's cursor is running away, do you know what it's attempting to do? What it's trying to read?


On a completely different topic, I'd like to ask you about another potential project. What's the best way to reach you for back-and-forth conversation, like Gitter/Slack/Mattermost/Zoom?

tamasgal commented 2 years ago

Thanks for the detailed answer, as always ˋ:)ˋ

Ah my bad, I forgot to start the Webserver on my machine, the file is now available if you want to play around.

My naive hope that there is a simple global flag somewhere which indicates that a file was properly closed is obviously destroyed ˋ;)ˋ Yes, I found ˋ ReadOnlyFileˋ and thought I could simply dig there to figure out how to get to that magical bit but I guess you are right about ˋkRecoverˋ being some runtime product.

The idea to check the ˋTKeyˋ list and pairing those up to find missing pieces sounds like an easy way to indicates. For our use-case at least the recovery is not so important. I’ll have a look tomorrow to see if I find some hints.

Regarding ˋUnROOT.jlˋ, I have not debugged further yet but it might be related to the streamer part, which is always being parsed completely. Let’s see ˋ;)ˋ


Curious about that project! You can find me e.g. on the Julia Slack but I also still have PyHEP2021 in my Slack workspace. Zoom is of course fine too.

ioanaif commented 6 months ago

Hi @tamasgal

If this still happens in uproot = 5.3.2 (main), could you please make the file available again so I can debug this?

tamasgal commented 6 months ago

I just checked, I don't see any difference

>>> import uproot

>>> f = uproot.open("not_properly_closed_file.root")

>>> uproot.__version__
'5.3.2.dev3+gaccc1ca'

>>> f.keys()
[]

>>> f.classnames()
{}

>>> f._fSeekKeys
0

>>> f._keys
[]

>>> f._fSeekParent
0

I reactivated the webserver, so the file can be downloaded with the original link above (here again: http://131.188.161.12:30002/not_properly_closed_file.root)