Closed sztaylor89 closed 7 years ago
@sztaylor89
Can you provide us with more details on this error? What version are you running? What LDF are you analyzing. What does your configuration look like? What computer were you using?
The more information you can provide the better.
I'm on my ANL1471 branch from my forked repo. I last diverged from the dev branch at commit 976d769e904003234708ce6f40088f0ba76b05ee. The file I was using is on kqxhc under /scratch2/anl2015/FEB2015/135SB/a135feb_12.ldf
Have you rebased this branch onto dev recently?
Here's my config file(as txt) Config_135_121616.txt
I haven't rebased since the addition of the refactoring of the channel data
I would update the branch and see if that fixes your issue. I cleaned up a number of pointer issues that could have caused this behavior. This may also be related to Issue #199, but I don't have any specifics for that at the moment.
The file also causes scope to crash at 14%.
@sztaylor89 Have you managed to update?
I have tested this with 1f5c39ead06011a9275887ba87696f3869817c7f. The issue exists, and the program starts to have issues 14% complete. It used up all of the memory on my laptop and things became sluggish. The program didn't crash, but it utilized all of the RAM and swap space. This suggests that there is an issue with the memory allocation, as mentioned in #199.
I will close this issue report since it's now confirmed to be a duplicate and move the discussion to the other issue.
I have tested this with 1f5c39e using a135feb_12.ldf. The program starts to have issues 14% complete. It used up all of the memory on my laptop and things became sluggish. The program didn't crash, but it utilized all of the RAM and swap space. This suggests that there is an issue with the memory allocation.
Confirmed that this issue also occurs with utkscanor. This eliminates the issue being with the histogramming classes because utkscan and utkscanor use different histogramming.
Memory usage of the program:
I have tried scanning IS599Oct_A052_02.ldf
and we get much farther than the previous ldf with no obvious memory related issues. There may be some sort of corruption in a135feb_12.ldf
that we are not handling properly.
From the damm histogram that was produced it looks like the file craps out about 400 seconds into the run.
(copied over from #199 )
We also discovered that scope
also starts eating memory ~14% as well. So i think wherever the code's issue with a135feb_12.ldf
is; it's common to utkscan
and scope
. There is obviously something fishy about this file. Is there a way the unpacker (guessing) could catch this, stop and move on? It's also curious that pixie_ldf_c
didn't see this issue (correct me if I'm wrong @sztaylor89 )
There's an option to fast-forward if you know how many words you want to skip. You can also use rejection regions if you know how much time to skip.
i was thinking preemptive rather than reactive. but i guess without knowing exactly why the file is causing issues, that may not be possible.
We first have to identify exactly why it failed, then we can figure out how to recover from it.
I would suggest trying to unpack the buffer headers with something like evtDump from evt2root v2 first and make sure it's not at the buffer level.
Do we have any codes internal to PAASS that can handle something like this?
Not that I know of. @cthornsb may have something, but I'm not sure it is built into PAASS
@rin-yokoyama : Nice catch. I have tested this, and you are correct. That error boggled my mind for quite a while.
When trying to run
a135feb_12.ldf
file, utkscan runs until 14%, then consumes all of the available computer memory, until ending with the message "killed".GDB says,