next-exp / IC

6 stars 70 forks source link

Rethinking post-penthesilea production #749

Open mmkekic opened 3 years ago

mmkekic commented 3 years ago

Since Beersheba hits are being used for track reconstructions, we need official code that runs paolina algorithm on Beersheba output. The code is already in Esmeralda and this is a matter of organizing different pipes. As a reminder current algorithm is:

The options that appeared at the meeting:

  1. Add tracking part to Beersheba; leave Esmeralda as is. This gives us both naive reconstruction and tracks as well as deconvoluted hits and tracks

  2. Add energy correction to Beersheba, remove it from Esmeralda and run Esmeralda after Beersheba. Beersheba may or may not output naive hits reconstruction. This is more modular since the tracking is separated from hits reconstruction.

  3. Have three independent cities : hits reconstruction (Beersheba), energy correction (??), tracking (Esmeralda)

@pnovella @msorel @paolafer @ausonandres @Aretno @jahernando @jjgomezcadenas @gonzaponte

mmkekic commented 3 years ago

I removed the poll since it was confusing, so please comment

msorel commented 3 years ago

Hi, as I said in the meeting, 1 probably sufficient in short term, as we are still applying xyz energy corrections using pre-deconvolution hits. Up to upcoming bb2nu first paper I guess we will do this.

For long term, beyond this paper, I think we should compute xyz corrections to energy after hit deconvolution. I think it can actually improve our energy resolution, as we correct the energy response using a much more faithful image of the event. This is just a feeling, and will have to be demonstrated with MC / data studies. In this scheme we will want to do options 2 or 3 in the long term, I think. If one does not want deconvolution (probably non-baseline case), option 2 could perhaps have a flag to do energy correction on pre-Beersheba hits, too, and not just on post-Beersheba ones.

So, to me it boils down if we want to formalise a solution for short term or for long term.

I have a slight preference for 2 (or 3, maybe), as option 1 may have a kind of short livetime, and we will want to change it in a couple of months...

paolafer commented 3 years ago

Given the experience with Esmeralda, I'm a bit reluctant to formalise code while still testing it; I'm afraid that we can end up doing a lot of work and then stopping using it shortly after. Maybe the best option would be to make the code as modular as possible, for instance, extracting from Esmeralda the hit correction and tracking functions, in such a way that they can be used anywhere else, and leave the city almost as a pipeline only. Beersheba could then add to its pipeline these functions and both cities would be usable. Does this approach make any sense in the city structure?

Aretno commented 3 years ago

Hi,

first, I think Pau's point about being able to have a fast reconstruction pipeline is really strong. As he pointed out, deconvolution is going to be considerably slower than the current scheme and, if we limit the energy corrections to that point, it will prevent us to check runs as instantaneously as before. This will be important not only for day-to-day operation but also in NEXT100 commissioning.

I'm also reluctant to just forget about the hits correction at all; deconvolution performance could change in different detectors configurations and I think having the corrected hits info could always serve as a guidance in case we see weird things at some point. But this could be solved by just doing both corrections, penthesilea hits and beersheba hits, and storing both tables.

In any case, I strongly support Paola in not formalizing code until we are completely sure about it thus, in any case, a throughout study of the energy resolution post-beersheba should be done before changing the corrections part.

Finally, I also dislike the idea of linking beersheba and paolina by default. As you know, I have a hard time believing that paolina approach is the best way to take advantage of the fine-grained tracks from deconvolution and would rather have them separate. Moreover, the lower we go in voxel size in paolina, the more time it takes to run and, whenever we need to optimize parameters of one or the other, we would be obligated to run both of them which, depending on the case, can be a waste of time.

So, in summary, I would go with Paola's suggestion, and have three different cities (corrections, beersheba, esmeralda). Corrections city should be able to read from either penthesilea or beersheba, beersheba should be able to read from penthesilea and corrections city, esmeralda should be able to read from corrections and beersheba.

pnovella commented 3 years ago

Hi all,

On 28/10/20 10:03, Aretno wrote:

So, in summary, I would go with Paola's suggestion, and have three different cities (corrections, beersheba, esmeralda). Corrections city should be able to read from either penthesilea or beersheba, beersheba should be able to read from penthesilea and corrections city, esmeralda should be able to read from corrections and beersheba.

I also agree with Paola and Ander.

In particular:

1) I think for the rest of the operation/analysis of NEXT-White we should keep as reference the "standard reconstruction" (penthesilea+esmeralda, current versions). This will allow for comparisons with deconvolution-based studies. The current scheme also provides a "fast" reconstruction and energy calibration useful for detector monitoring and "quick" analyses.

2) Independently of 1), we can start working in the development of three independent cities (corrections, beersheba and esmeralda), as described by Ander. Once this is implemented, the sw would allows us to build data processing chains delivering either the "standard reconstruction" or the "deconvolution-based reconstruction" (ie, maximum flexibility).

3) As 2) might take time, I still think we need some sort of minimal formalization of the deconvolution-based reconstruction for the short time scale (ie, next round of bb analyses). As Michel points out, this would imply implementing Marija's option 1) (ie, corrections applied before deconvolution). However, it is not clear to me how to do this in the most simplest way...

Cheers, Pau

mmkekic commented 3 years ago

Ok, so the decision is that for now we leave Esmerala input and output as is, and make tracking city (urgent) and energy corrections pipe (not urgent).

As a first (and the simplest) step we only need tracking city since Beersheba is using Esmeralda output for now. I believe all readers already exist, and the part of tracking can simply be reused from Esmeralda. Hence, a volunteer (@ausonandres ?) should extract part of the pipe from Esmeralda that will be reused (together with appropriate sinks) and place it in components.py, and have a very simple tracking city flow. This part should be very easy and fast, is just placing the already existing code to a different position.

The hits energy corrections are more annoying since the current Esmeralda functions use event_model type (as do paolina functions), but Beersheba uses dataframes directly. In case we want to do energy correction -> deconvolution -> paolina, with the current functions, it would mean casting from pytables (DataFrame basically) to HitCollection, back to DataFrame to be used in deconvolution, and back to HitCollection to be used in paolina. The type casting is extremely time and memory consuming and this flow makes no sense. I would suggest to make hits energy correction (used in Esmeralda) accept both DataFrame or our custom HitCollection type (adding appropriate tests to make sure it is all consistent). Again, this should not be difficult, but it does require some code to be written.

In any case I think we need 3 independent pipes : deconvolution, energy correction and tracking, making sure that the data format passed in between them is consistent. I am not convinced that we need 3 independent cities, probably having a very-easy-to-combine pipes (really, is just calling 1-2 functions inside a city flow) is good enough till we decide on definite reconstruction scheme and what intermediate data format we want to store.

jacg commented 3 years ago

Hence, a volunteer (@ausonandres ?) should extract part of the pipe from Esmeralda that will be reused (together with appropriate sinks) and place it in components.py, and have a very simple tracking city flow. This part should be very easy and fast, is just placing the already existing code to a different position.

Music to my ears.

Carry on.

jjgomezcadenas commented 3 years ago

Music to my ears too.

I propose Isaura for the tracking city.

On 28 Oct 2020, at 14:13, Jacek Generowicz notifications@github.com wrote:

Hence, a volunteer (@ausonandres https://github.com/ausonandres ?) should extract part of the pipe from Esmeralda that will be reused (together with appropriate sinks) and place it in components.py, and have a very simple tracking city flow. This part should be very easy and fast, is just placing the already existing code to a different position.

Music to my ears.

Carry on.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/next-exp/IC/issues/749#issuecomment-717923490, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5SID3KMEMHKYISRR65SSDSNAKHFANCNFSM4TBCYHZQ.

ausonandres commented 3 years ago

Sorry I didn't say anything, but I already started doing this. There is one issue that I'd like to comment: in my opinion this tracking city after Beersheba should output kdst information as well, since it's interesting to have access to some variables, particularly nS2 (it's used for the analysis), however Beersheba's output doesn't contain this table. For the time being, should we just leave it as it is and in the final part of the analysis -after running all cities-, joining the kdst information to the tracks table manually? And then in the future change Beersheba to also save the table?

Aretno commented 3 years ago

I actually agree, my bad for not putting it since the beginning. We should probably add it now as it would just be adding one line or two to the code (basically import the writer from esmeralda and add it to the pipeline). If agreed by everyone I'll proceed and do this on monday.

In case this is not done; don't you have that kind (or at least equivalent) info on the Summary table? That is currently being stored.

ausonandres commented 3 years ago

Not now, in summary table it is basically contained the position of the different events and a few more magnitudes event-related (energy, number of tracks, number of hits and charge), but not number of S2 signals.