Open gipert opened 1 year ago
are you sure it's not just a problem related to the config files not being generated correctly? we have other parameters in the hit tier that are there for ged and not spm and did not have this problem in testing, I think, right @gracesong312 ?
Yes I am sure. It's just because in rectangular data structures you obviously need some placeholder, when the hit corresponding to a certain index does not define a column value:
hit_table name is_valid_0vbb
1052802 V07302A True
1052802 S060 ?????
In case of floats one could use NaN
, but no "missing" placeholder exists for booleans.
I think I overlooked this in the original testing, here's a basic test I just ran on the legend-testdata
files:
energies
and is_valid_hit
(sipm only) and trapEmax
and is_valid_0vbb
(ge only) for both a sipm channel and a ge channel at the same time.trapEmax
is 0 or a very small number (e-44) is_valid_0vbb
is Falseenergies
is a list, which occasionally has large numbers in itis_valid_hit
is a list, mix of True and False
So I'm not getting the issue with germanium parameters in sipm hits, but given that it's happening the other way around and the numbers change if I rerun the script, I assume it's just because I'm not running on enough files to see it. Is it possible to just return None
?
In my test I randomly get true or false in is_valid_0vbb
for SiPM data. I can work on a MWE tomorrow.
@gracesong312 do you pre-allocate empty columns for the output table? This could explain why the values are unpredictable (because no actual value is ever written to the pre-allocated memory).
Pandas uses NumPy internally, so a boolean column cannot contain non-booleans. I would not force that column to be float in order to be able to use NaN
s, I think we should use a different data structure. Discussion for next week!
Yes, I allocate memory with np.empty
which explains the random values. https://github.com/legend-exp/pygama/blob/663d352ef9d2d5b73b0c5927dad88692a5f80442/src/pygama/flow/utils.py#L137-L144
shouldn't the ged hits and spm hits be in two separate lists? otherwise we are going to have nans everywhere, right?
hit_table name energy npe 1052802 V07302A 2039 nan 1052802 S060 nan 3
Jason Detwiler Department of Physics, Box 351560 University of Washington Seattle, WA 98195 (206) 543-4054
On Thu, Oct 12, 2023 at 2:32 PM Grace Song @.***> wrote:
Yes, I allocate memory with np.empty which explains the random values. https://github.com/legend-exp/pygama/blob/663d352ef9d2d5b73b0c5927dad88692a5f80442/src/pygama/flow/utils.py#L137-L144 https://urldefense.com/v3/__https://github.com/legend-exp/pygama/blob/663d352ef9d2d5b73b0c5927dad88692a5f80442/src/pygama/flow/utils.py*L137-L144__;Iw!!K-Hz7m0Vt54!gCX4qyzdSLr9lc6oeLCOU0jsUpBByOXxbbzKqJJzWnJvhvppOTBtQ0-Igj0EzICYxMc438egOj73chertl9qJYE$
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/legend-exp/pygama/issues/518*issuecomment-1760392434__;Iw!!K-Hz7m0Vt54!gCX4qyzdSLr9lc6oeLCOU0jsUpBByOXxbbzKqJJzWnJvhvppOTBtQ0-Igj0EzICYxMc438egOj73chertEPmOLM$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AARCTXUQNNEWVH7WJLBRB7TX7BOWRANCNFSM6AAAAAA554USNE__;!!K-Hz7m0Vt54!gCX4qyzdSLr9lc6oeLCOU0jsUpBByOXxbbzKqJJzWnJvhvppOTBtQ0-Igj0EzICYxMc438egOj73cherfIjYTXM$ . You are receiving this because you commented.Message ID: @.***>
Example:
is_valid_0vbb
is available in tierhit
for thegeds
subsystem only. If loading data fromspms
andgeds
at the same time,is_valid_0vbb
will randomly provideTrue
orFalse
(I guess because of uninitialized memory).This is obviously very dangerous and must be fixed ASAP.
We cannot simply fix it by using default values unfortunately (
NaN
, for example, works only with floats), so we need to return a different data structure.