key4hep / EDM4hep

Generic event data model for HEP collider experiments
https://cern.ch/edm4hep
Apache License 2.0
25 stars 35 forks source link

add loading pythonizations from podio #290

Closed m-fila closed 2 weeks ago

m-fila commented 6 months ago

BEGINRELEASENOTES

ENDRELEASENOTES

Load podio pythonizations (AIDASoft/podio#570) on import edm4hep

tmadlener commented 1 month ago

Since this now would also include the "freezing" of the objects (see AIDASoft/podio#663), I would be in favor of adding these to EDM4hep. At least freezing the objects would be good to have to avoid having things like #357 (and a few small fixes in #353)

tmadlener commented 1 month ago

There seems to be some interference with the python (test) modules from EDM4hep? https://github.com/key4hep/EDM4hep/actions/runs/10737553638/job/29779263141?pr=290#step:3:831

jmcarcell commented 1 month ago

But that is the nightlies without https://github.com/AIDASoft/podio/pull/663 included, edit: ah shouldn't change anything I guess :thinking:

tmadlener commented 1 month ago

Yes, exactly. The freezing will only be picked up tomorrow, but the other pythonizations should already work today.

m-fila commented 1 month ago

Yeah, I think I messed up something with walking through submodules and the local modules are included :unamused: Let me fix that

tmadlener commented 3 weeks ago

Looks like AIDASoft/podio#667 fixed one issue in the pythonizations and we can properly import now, but we now have a different issue:

https://github.com/key4hep/EDM4hep/blob/b99ba9f9359f9194ec15ab92f5f9c1c422a6d9b3/scripts/createEDM4hepFile.py#L88-L89

Seems to assume a const MCParticleCollection, hence calling operator[](unsigned) const which returns a MCParticle instead of the operator[](unsigend) which would return a MutableMCParticle. I am not sure whether any of the pythonizations triggers this, in principle this should work.

m-fila commented 3 weeks ago

If you mean this error in the CI https://github.com/key4hep/EDM4hep/actions/runs/10773294123/job/29872769550?pr=290#step:3:840 then it should also disappear once the fix is propagated to the nightlies

tmadlener commented 3 weeks ago

These workflows already pick up the latest version of podio as they build it on the fly. The problem in this case really seems to be that the wrong overload for operator[](unsigned) is chosen. Looking at #358 the writing seems to work as expected, at least the information can be read back.

m-fila commented 3 weeks ago

Sorry for the confusion, you are absolutely right. The culprit seems to be the "collection subscript" pythonization that replaces the __getitem__ (by default bound by cppyy to operator[]) with call to at. For some reason cppyy chooses to use non-cost overload of operator[] and const overload of at

As a clarification, that difference in constness for [] and at is present without any pythonizations too (and is also present in older releases):

>>> import edm4hep
>>> col = edm4hep.MCParticleCollection()
>>> p = col.create()
>>> type(p)
<class cppyy.gbl.edm4hep.MutableMCParticle at 0xea24410>
>>> type(col[0])
<class cppyy.gbl.edm4hep.MutableMCParticle at 0xea24410>
>>> type(col.at(0))
<class cppyy.gbl.edm4hep.MCParticle at 0xf5006a0>
tmadlener commented 3 weeks ago

@m-fila, I will make changes in #361, such that this should start working after it has been merged.

tmadlener commented 3 weeks ago

At least for our script the changes are not too bad, I think. See:

https://github.com/key4hep/EDM4hep/pull/361/commits/bec3a96390b3b4c4dd7c4f1e45d0683d9e275239

m-fila commented 3 weeks ago

I timed some usage with and without pythonizations. For commands I list the first time it was called during a session as some things are cached. edm4hep local, podio from nightlies

Command REPL Without pythonizations (s) With pythonization (s)
import edm4hep python 0.6 2.7
import edm4hep ipython 0.5 2.9
edm4hep.<TAB><TAB> python ~4 ~5
edm4hep.<TAB><TAB> ipython ~1 ~1
edm4hep.MC<TAB><TAB> python ~2 ~2
edm4hep.MC<TAB><TAB> ipython ~1 ~1
edm4hep.MCParticleCollection() python 3.2 1.5
edm4hep.MCParticleCollection() ipython 3.2 1.5
createEDM4hepFile.py python 11.6 11.6

The increase in import time is due to the from podio.pythonizations import load_pythonizations in the __init___ which takes more than 2 s on its own (same for import podio). The decrease in the object creation time is due to this import rather than load_pytonizations()

I don't know how to programmaticaly time autocompletion in python so the estimates for these are very rough

Otherwise I don't see big differences with and without the pythonizations. This might change in the future if there will be more of them

For the sake of benchmark changed createEDM4hepFile.py:

-    for i in range(3):
-        particle = particles.create()
+    mutable_particles = [particles.create() for _ in range(3)]
+    for particle in mutable_particles:
tmadlener commented 2 weeks ago

Thanks for the extensive checking. This looks about as expected, at least for me. Curious that creating a collection becomes quicker with pythonizations. Could that be because we are measuring different things at that point, i.e. is it possible that part of the things that are done at edm4hep.MCParticleCollection() without pythonizations is done during import edm4hep with pythonizations?

m-fila commented 2 weeks ago

Yes, no pythonizations just adding import podio to __init__.py also changes the first creation of collection to that ~1.5 s

tmadlener commented 2 weeks ago

Let's see if #361 indeed makes this pass as expected :)

tmadlener commented 2 weeks ago

Everything looks to be working now. Merging this later today, unless there is something more to do.

m-fila commented 2 weeks ago

Ufff. Should be good to go

jmcarcell commented 2 weeks ago

I wonder if at some point is better to implement the python bindings ourselves...