next-exp / IC

5 stars 74 forks source link

Voxels sometimes have no associated hits #602

Closed paolafer closed 5 years ago

paolafer commented 5 years ago

I've found that for some events the voxelization algorithm creates voxels with non-zero energy, but assigns no hits to them. I'm puzzled because we have a specific test to catch this mistake (test_hits_energy_in_voxel_is_equal_to_voxel_energy), but maybe it is unable to find very particular cases. I wasn't aware that we could have this kind of problems (meaning specific cases that escape a general test, where random events are generated), and I don't know how to approach it in the future, beyond making hypothesis hard (which I already did). I can provide the file with the incriminated event and a small code to test it, in case anyone wants to have a look at it, since I cannot find out what is happening. I thought I could attach them here, but apparently h5 format is not accepted...

jacg commented 5 years ago

Create a branch, commit the h5 file and the test that uses the file to the branch and push it to your fork.

(Not that I'm likely to have time to look at it, but at least it will make the example available to others.)

paolafer commented 5 years ago

Thanks, @jacg, I've created the branch voxels-with-no-hits in my fork, with a new test test_voxels_with_no_hits, which fails because of this behaviour.

jacg commented 5 years ago

Just a quick note; compare the following three spellings:

sum([h.E for h in v.hits])
sum((h.E for h in v.hits))
sum( h.E for h in v.hits )

The difference between the first and second is that the first uses a list comprehension, while the second uses a generator expression. This means that the former needs to have all the elements of the list in memory at the same time; the generator expression is a lazy version of the list comprehension and keeps only one element in memory at a time. For small sequences this is unlikely to make much difference, but as the sequence size grows, the time and space advantages of the latter become more and more significant. So it is a good to develop the habit of spelling these sorts of things with the genexpr rather than the listcomp.

The third spelling is semantically identical to the second: if a generator expression is the only argument in a function call, then you can drop one pair of parentheses.

mmkekic commented 5 years ago

This seems to be an issue with floating points and numpy histograms and not a bug in the code of hits assignment... Converting hit_positions from float32 to float64 as hit_positions = np.array([h.pos for h in hits]).astype('float64') solves this particular problem and gives the correct histogram but I wonder how many of similar issues have we overlooked

jjgomezcadenas commented 5 years ago

Well, good catch. As for similar issues, we will just keep digging!