lbl-anp / becquerel

Becquerel is a Python package for analyzing nuclear spectroscopic measurements.
Other
43 stars 16 forks source link

DRAFT: N42 file class and IO integration #323

Open jccurtis opened 2 years ago

jccurtis commented 2 years ago

Replaces #206

Closes #24

jvavrek commented 1 year ago

With this branch I'm running into an issue parsing a long dwell measurement where at least one channel has >1e6 counts. E.g., a snippet of the ChannelData field looks like:

991326 1.05562e+06

and the error is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [59], line 4
      2 for fn in fns:
      3     try:
----> 4         n42 = bq.parsers.n42.N42File(fn)
      5     except ValueError as exc:
      6         print(f"\n!!! could not process {fn.stem}\n")

File ~/becquerel/becquerel/parsers/n42.py:167, in N42File.__init__(self, path)
    165     raise BecquerelParserError(f"Invalid N42 root tag: {self.root.tag}")
    166 # Read
--> 167 self._parse()

File ~/becquerel/becquerel/parsers/n42.py:217, in N42File._parse(self)
    215 compression = meas["Spectrum"]["ChannelData"]["compressionCode"]
    216 # TODO: can you have multiple spectra per measurement?
--> 217 counts = _parse_channel_data(
    218     meas["Spectrum"]["ChannelData"]["value"], compression
    219 )
    220 n42_meas = N42RadMeasurement(starttime, realtime, livetime, counts, calib)
    221 self.measurements[meas["id"]] = n42_meas

File ~/becquerel/becquerel/parsers/n42.py:123, in _parse_channel_data(text, compression)
    121 text = text.strip().replace("\n", " ")
    122 tokens = text.split()
--> 123 data = [int(token) for token in tokens]
    124 if compression == "CountedZeroes":
    125     new_data = []

File ~/becquerel/becquerel/parsers/n42.py:123, in <listcomp>(.0)
    121 text = text.strip().replace("\n", " ")
    122 tokens = text.split()
--> 123 data = [int(token) for token in tokens]
    124 if compression == "CountedZeroes":
    125     new_data = []

ValueError: invalid literal for int() with base 10: '1.05562e+06'
jvavrek commented 1 year ago

We probably just need to try/except int/float: https://stackoverflow.com/a/5609191

markbandstra commented 1 year ago

Why not just parse them all into float?

micahfolsom commented 1 year ago

You'd want to do int(float()) since these numbers should fundamentally still be integers, right? They presumably get converted when put into the array anyway but this way it'd be explicit.

jvavrek commented 1 year ago

Yeah, int(float(x)) seems cleanest here. Thanks @micahfolsom !

jvavrek commented 11 months ago

All python3.9+ tests fail, looks like due to an unexpected number of warnings in materials_test.py.

jvavrek commented 11 months ago

Did we ever figure out if we can include the annex file(s) from NIST? Is that just the n42.xsd file?

markbandstra commented 11 months ago

That was a reference to some example N42 files that NIST provides with the standards that I thought could be used for unit tests. I don't think we can (or would want to) distribute them with this repo, but rather I think automatically downloading them when testing would be an option.

The same goes for n42.xsd --- I doubt that we can have it in our repo, but we can probably download it when installing.

jvavrek commented 1 month ago

This is going to get more complicated as each RadMeasurement can contain multiple Spectrum keys. H3D detectors for instance do this, because they offer three spectrum modes for each measurement depending on how the user wants to handle multi-site interactions.