Closed jimbraun closed 2 years ago
From Andy:
I suspect that you are correct. By your suggestion, I removed the compression and the problem of producing unreadable files goes away. The bigger problem is that large events take a zillion hours to simulate and contain more hits than fit in memory, so in the end, the arrays that hold the data in XCDF are not the culprit, just where the memory is actually allocated.
I’m good with you closing this out.
Hi Andy,
I've continued working on this problem. I've found that if I fill the missing data at the end of the event with either zeroes or ones, I can re-write the file without errors. Interestingly, the re-written file is slightly smaller, which implies some differences between my environment (on OSX) and the one used by HAWCSim.
Can you please point me to the HAWCSim source file where the XCDF serialization takes place?
To summarize where we are:
The problem manifests as an XCDF data block with a correct checksum, but fewer bytes are in the block than expected. The final field written to the block (HAWCSim.PE.Energy) cannot be fully filled due to the lack of data (about 1M entries short).
The above implies that the latest point where corruption could occur is at the zlib deflate stage within XCDF.
The problem is constrained to one of the following:
An error in XCDF that manifests on the HAWC cluster, but not on my machine
An XCDF user error in HAWCSim that is not caught by XCDF
Memory corruption or something else very bad happening in HAWCSim.
Issues to fix in the XCDF source code are at least: