Bruker Raw Converter Enhancement

bredema1 commented 3 weeks ago

Thanks for the great tool! I have a suggestion for an enhancement, specifically, adding a new input json parameter, cps [counts per second], that controls whether the returned data has the bare y-values (counts), or are normalized to "counts per second" by dividing each y-value by the time per step. This would default to a value of false for RAW files that have a single data block in them, and default to a value of true when the file contains multiple blocks. This would ensure that the returned data is directly plottable without any additional data manipulations. At the same time, allowing the default to be overriden allows more advanced consumers to get at the raw counts data.

I attach here an example multiblock raw file, and what I would expect to be returned when cps is false (_cps_false.csv) and what I would expect to be returned when cps is true (_cps_true.csv). Note I think it also helps to, in the latter case, print all block headers at the beginning, before any of the data, whereas in the former it makes more sense to put each block header just before the corresponding data (as it currently does).

The normalization to cps means dividing each y-value in the block by the value of that block's header "TIME_PER_STEP".

Thank you! ML_Bench_20241031_1_LJM2BBWYR.EXT_CP_049465-001.-0p818mm_24hr_4degSoller_EmptyCapillary_cps_true.csv ML_Bench_20241031_1_LJM2BBWYR(EXT_CP_049465-001)-0p818mm_24hr_4degSoller_EmptyCapillary_cps_false.csv

tmcqueen-materials commented 3 weeks ago

@bredema1 Thanks Ben! @pcauchy1 : So we don't forget about it, to completely fix this issue, couple other things:

The missing f.close() call after line https://github.com/paradimdata/project_chameleon/blob/64b8cf46707b39ca2f926811a6e2b03c281b455d/project_chameleon/brukerrawconverter.py#L72 needs to be added, to make sure all buffered output is actually written to the file, and to avoid a file handle leak. (you should also check other handlers and fix any other missing close calls, if there are any).
These lines: https://github.com/paradimdata/project_chameleon/blob/64b8cf46707b39ca2f926811a6e2b03c281b455d/project_chameleon/brukerrawconverter.py#L58-L61 should be adjusted so that export_metadata is called for every block when it is a multiblock file (instead of what this code currently does, which is not output the metadata for each block at all). And of course this either needs to be either "just before each block", or "all at once at the beginning of the file" depending on the proposed cps setting above.

pcauchy1 commented 3 weeks ago

Added cps flag. Headers are adjusted based on cps flag. f.close() was added in appropriate places in a couple of files.

paradimdata / project_chameleon

Bruker Raw Converter Enhancement #17