DRAFT: N42 file class and IO integration

jccurtis commented 2 years ago

[ ] Add class to wrap N42 file and provide conversion utils.
- [x] Read calibration
- [x] Read spectra
- [ ] Apply calibration automatically
- [ ] Read comment metadata
[x] Add dataclasses to contain spectra and other data elements from a N42 file since the parsers and io modules cannot import from spectrum.
[x] Update to python 3.7 (for dataclasses)
[ ] Add tests
[x] Add notebook with example(s)
[ ] Figure out annex files from NIST (do we want to include these - is there a copyright issue?) Attn @markbandstra

Replaces #206

Closes #24

jvavrek commented 1 year ago

With this branch I'm running into an issue parsing a long dwell measurement where at least one channel has >1e6 counts. E.g., a snippet of the ChannelData field looks like:

991326 1.05562e+06

and the error is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [59], line 4
      2 for fn in fns:
      3     try:
----> 4         n42 = bq.parsers.n42.N42File(fn)
      5     except ValueError as exc:
      6         print(f"\n!!! could not process {fn.stem}\n")

File ~/becquerel/becquerel/parsers/n42.py:167, in N42File.__init__(self, path)
    165     raise BecquerelParserError(f"Invalid N42 root tag: {self.root.tag}")
    166 # Read
--> 167 self._parse()

File ~/becquerel/becquerel/parsers/n42.py:217, in N42File._parse(self)
    215 compression = meas["Spectrum"]["ChannelData"]["compressionCode"]
    216 # TODO: can you have multiple spectra per measurement?
--> 217 counts = _parse_channel_data(
    218     meas["Spectrum"]["ChannelData"]["value"], compression
    219 )
    220 n42_meas = N42RadMeasurement(starttime, realtime, livetime, counts, calib)
    221 self.measurements[meas["id"]] = n42_meas

File ~/becquerel/becquerel/parsers/n42.py:123, in _parse_channel_data(text, compression)
    121 text = text.strip().replace("\n", " ")
    122 tokens = text.split()
--> 123 data = [int(token) for token in tokens]
    124 if compression == "CountedZeroes":
    125     new_data = []

File ~/becquerel/becquerel/parsers/n42.py:123, in <listcomp>(.0)
    121 text = text.strip().replace("\n", " ")
    122 tokens = text.split()
--> 123 data = [int(token) for token in tokens]
    124 if compression == "CountedZeroes":
    125     new_data = []

ValueError: invalid literal for int() with base 10: '1.05562e+06'

jvavrek commented 1 year ago

We probably just need to try/except int/float: https://stackoverflow.com/a/5609191

markbandstra commented 1 year ago

Why not just parse them all into float?

micahfolsom commented 1 year ago

You'd want to do int(float()) since these numbers should fundamentally still be integers, right? They presumably get converted when put into the array anyway but this way it'd be explicit.

jvavrek commented 1 year ago

Yeah, int(float(x)) seems cleanest here. Thanks @micahfolsom !

jvavrek commented 11 months ago

All python3.9+ tests fail, looks like due to an unexpected number of warnings in materials_test.py.

jvavrek commented 11 months ago

Did we ever figure out if we can include the annex file(s) from NIST? Is that just the n42.xsd file?

markbandstra commented 11 months ago

That was a reference to some example N42 files that NIST provides with the standards that I thought could be used for unit tests. I don't think we can (or would want to) distribute them with this repo, but rather I think automatically downloading them when testing would be an option.

The same goes for n42.xsd --- I doubt that we can have it in our repo, but we can probably download it when installing.

jvavrek commented 1 month ago

This is going to get more complicated as each RadMeasurement can contain multiple Spectrum keys. H3D detectors for instance do this, because they offer three spectrum modes for each measurement depending on how the user wants to handle multi-site interactions.

lbl-anp / becquerel

DRAFT: N42 file class and IO integration #323