scikit-hep / uproot3

ROOT I/O in pure Python and NumPy.
BSD 3-Clause "New" or "Revised" License
314 stars 67 forks source link

How to create a root tree? #543

Open fgarcia01 opened 2 years ago

fgarcia01 commented 2 years ago

Hi, Trying to create a root tree and stuck at the beggining with: AttributeError: module 'uproot' has no attribute 'newtree' I need to create two branches: voltage int32, current float64 If you could put an example will be great Using python3.9 and uproot3 Thanks in advance!

fgarcia01 commented 2 years ago

I need a hand here:

def execute(self):
    voltages = np.arange(self.min_voltage, self.max_voltage+self.voltage_step, self.voltage_step)
    steps = len(voltages)
    with uproot3.recreate("IVcurve.root") as f: 
        f["t"] = uproot3.newtree({"voltage": "float64", "current": "float64"})
    log.info("Starting to sweep through voltage")
    for i, voltage in enumerate(voltages):
        log.debug("Sweep voltage: %g V" % voltage)

        self.source.source_voltage = voltage
        # Or use self.source.ramp_to_current(current, delay=0.1)
        sleep(self.delay)

        current = self.meter.current
        print(voltage)
        print(current)
        print()
        f["t"].extend({"voltage": voltage, "current": current})
        if abs(current) <= 1e-10:
            resistance = np.nan
        else:
            resistance = voltage/current
        data = {
            'Voltage (V)': voltage,
            'Current (A)': current,
            'Resistance (Ohm)': resistance
        }
        self.emit('results', data)
        self.emit('progress', 100.*i/steps)
        if self.should_stop():
            log.warning("Catch stop command in procedure")
            break
    uproot3.numentries("IVcurve.root", "t")    # at the end of the for loop    

The problem is here: f["t"].extend({"voltage": voltage, "current": current})

with this error: File "D:\My Projects\PYTHON\iv_keithley_forum_2.py", line 81, in execute f["t"].extend({"voltage": voltage, "current": current}) File "C:\Users\fgarcia\AppData\Roaming\Python\Python39\site-packages\uproot3\write\objects\TTree.py", line 114, in extend if all(len(first) == len(value) for value in values) == False: File "C:\Users\fgarcia\AppData\Roaming\Python\Python39\site-packages\uproot3\write\objects\TTree.py", line 114, in if all(len(first) == len(value) for value in values) == False: TypeError: object of type 'numpy.float64' has no len()

I tried to use append instead of extend and also give errors.

The values need to be written as they are being read.....

jpivarski commented 2 years ago

Trying to create a root tree and stuck at the beggining with: AttributeError: module 'uproot' has no attribute 'newtree'

You're mixing Uproot 3 and 4 (which I think you eventually fixed, in favor of Uproot 3). Since you seem to be developing a new script, rather than keeping an old one running, you should start with the latest version. (Uproot 3 is only being kept around for analysis code that was developed using it and now needs to keep working so that somebody can graduate or otherwise finish up a one-time project.) Uproot 4 became the default "uproot" (as in pip install uproot) on Dec 5, 2020, so it's been more than a year.

Here's where the documentation on Uproot 4 writing starts: https://uproot.readthedocs.io/en/latest/basic.html#opening-a-file-for-writing

The following line isn't responsible for the error message, but:

    with uproot3.recreate("IVcurve.root") as f: 
        f["t"] = uproot3.newtree({"voltage": "float64", "current": "float64"})
    log.info("Starting to sweep through voltage")
    for i, voltage in enumerate(voltages):

outside of this with block, the file f will be closed. Attempts to write to it will fail (on purpose). You need to have the code that fills the TTree be inside the with block (one indentation level to the right).

The reason you're getting an error is because extend takes a dict mapping branch names to arrays of numbers, not a dict mapping branch names to individual numbers. So you want

f["t"].extend({"voltage": voltages, "current": currents})

where voltages and currents are pluralized because they refer to all the data. The voltages NumPy array is what you want to have going here—you do not want to have a for loop over NumPy arrays. (That's not just an Uproot thing; that's generally true of using Python, NumPy, and Pandas. You arrange your work to be one array at a time, rather than one value at a time.)


Looking at your code some more, I see now that that's how self works. The class instance referred to by self seems to be representing some circuit that you can apply voltages to (with self.source.source_voltage = voltage) and observe the current response (with self.meter.current). You actually wait for some real time (time.sleep(self.delay)) after applying the voltage and before measuring the current, so presumably this Python code is hooked up to some real system somewhere. I'll just imagine that you have an RLC circuit connected to some Python code through a Raspberry Pi or something, though in actuality, your circuit may be much more complicated—that doesn't change the fact that it may need some time to adjust to a new voltage and let transients die away or something.

Since the function

def current_response(voltage):
    self.source.source_voltage = voltage
    time.sleep(self.delay)
    return self.meter.current

goes through Python code—it's not purely NumPy—then you'll have to write a Python for loop calling it. Since it may take 0.1 seconds per sample point, then the normal argument of using NumPy arrays instead of Python loops because Python is slow (~microseconds per loop iteration) also isn't relevant. Ignoring any discussion or ROOT or Uproot, it would be entirely reasonable to do this code in Python, rather than involving array-at-a-time NumPy.

However, Uproot just doesn't work that way. The extend function takes an array, and if you pass in arrays of length 1, you'll get a ROOT file with TBaskets of length 1, and that would be very inefficient for the next step in the process to read. (TBaskets should have at least 10s of thousands of entries for numerical data like this; if you have fewer data points, then it should be a single-TBasket file: i.e. extend should only be called once.) In Uproot 3, we attempted to add an interface called append that would collect these values and combine them into reasonably sized TBaskets, but that ended up in a mess and we didn't even try to add such an interface to Uproot 4.

So what you'll have to do is iterate over the data in a Python for loop, collect the results, and dump those results in the file in one batch. For instance,

def execute(self):
    voltages = np.arange(
        self.min_voltage,
        self.max_voltage + self.voltage_step,
        self.voltage_step,
        dtype=np.float64,
    )

    currents = np.zeros(len(voltages), dtype=np.float64)
    for i, voltage in enumerate(voltages):
        self.source.source_voltage = voltage
        time.sleep(self.delay)
        currents[i] = self.meter.current

    with uproot.recreate("IVcurve.root") as f:
        f["t"] = {"voltage": voltages, "current": currents}

A single extend is implicit in the assignment of a dict of arrays to a new TTree. Alternatively, you could do

    with uproot.recreate("IVcurve.root") as f:
        t = f.mktree("t", {"voltage": np.float64, "current": np.float64})
        t.extend({"voltage": voltages, "current": currents})

Also, it looks to me like you're just making this ROOT file in order to plot the data. There are much easier ways to do that, like Matplotlib's plt.plot:

import matplotlib.pyplot as plt
plt.plot(voltages, currents)

And if you need to save the NumPy arrays in some form, you could just pickle them:

import pickle

# write
pickle.dump({"voltage": voltages, "current": currents}, open("IVcurve.pkl", "wb"))

# read
voltages_and_currents = pickle.load(open("IVcurve.pkl", "rb"))
voltages = voltages_and_currents["voltage"]
currents = voltages_and_currents["current"]

or use NumPy's own file format:

# write
np.savez(open("IVcurve.npy", "wb"), voltage=voltages, current=currents)

# read
voltages_and_currents = np.load(open("IVcurve.npy", "rb"))
voltages = voltages_and_currents["voltage"]
currents = voltages_and_currents["current"]

Or HDF5, etc. If the point is to get it into ROOT to use ROOT's plotting features, then yes, write a ROOT file. Otherwise, if you're going to be using Matplotlib (because you're in Python already), data with simple structure like this—just two columns of numbers—can be saved in a variety of ways, most of which are simpler and recognized by more data analysis tools.

If, on the other hand, you ever need to write jagged arrays or more complex data structures, then you'll want to seriously consider ROOT (or Parquet).

fgarcia01 commented 2 years ago

@jpivarski many thanks indeed now many things are clearer and I have introduced all these changes. However, only tomorrow I can test it when getting access to the device. Yes, the idea is to produce a root file, which can then later be opened/analyzed in ROOT.

fgarcia01 commented 2 years ago

Could you add an example of how for this tree and I can add new baskets? I mean I would like to add more data to the same branches, but tried different ways and none of them works. Thanks in advance!

jpivarski commented 2 years ago

If you have created a new TTree like

with uproot.recreate("IVcurve.root") as f:
    f["t"] = {"voltage": voltages, "current": currents}   # voltages and currents are NumPy arrays

and have not yet closed the file (i.e. you're still in the "with" block above), you can add a second TBasket to it by

    f["t"].extend({"voltage": voltages, "current": currents})   # voltages, currents may be new arrays

(Note the indentation: we're still in the "with" block; the file has not closed.)

You can't, however, open a file and change an existing TTree. That's not in principle impossible, but it's code that has not been written; it's not a feature of Uproot yet.

The reason uproot.WritableTree has an extend method is to allow you to write TTrees that are larger than the arrays you can store in memory by calling extend with as much data as you can in each call. If it's possible to write the TTree in one function call—i.e. you have enough memory to do so—you would make better ROOT files by minimizing the number of TBaskets. Files with the same number of entries spread among fewer TBaskets use less disk space and are faster to read in both Uproot and ROOT.

It's possible (though inadvisable) to write each entry as a separate TBasket. (Seriously inadvisable! I'm not going to write out the code!)