feat: [wip] native implementation of SDK

some internal notes for myself... to keep track of what's going on here:

nd2._pysdk._pysdk.ND2Reader is the new pure class that takes the place of the cython sdk wrapper _sdk.latest.ND2Reader

as a first pass, I'm re-implementing in python all of the metadata-reading methods from the sdk wrapper (those that returned structured objects). These include:

class ND2Reader:
    @property
    def attributes(self) -> structures.Attributes: ...
    def metadata(self) -> structures.Metadata: ...
    def frame_metadata(self, seq_index: int) -> structures.FrameMetadata: ...
    def experiment(self) -> list[structures.ExpLoop]: ...
    def text_info(self) -> structures.TextInfo:

they each generally begin with parsing a specific chunk using ND2Reader._decode_chunk:

attributes -> b"ImageAttributesLV!"
_raw_image_metadata -> b"ImageMetadataSeqLV|0!"
text_info -> b"ImageTextInfoLV!"
experiment -> b"ImageMetadataLV!"

_decode_chunk loads the raw bytes for the key from the chunk map and then dispatches to one of two parsers based on the nd2 version, each of which is supposed to return a dict[str, Any]:
- nd2 version < 3.0 uses nd2._xml.parse_variant_xml
- nd2 version > 3.0 uses nd2._pysdk._decode.decode_CLxLiteVariant_json
there are then a number of functions in the nd2._pysdk._parse module that take the parsed data for each metadata section and convert them to "structured" objects (i.e. data class objects from the nd2.structures module). It's possible that the return type of these functions might be relaxed to simply return another "cleaned up dict" that could be passed to these data structures. but for now the actual data structure is returned. The "main" functions include:
- nd2._pysdk._parse.load_metadata
- nd2._pysdk._parse.load_exp_loop
- nd2._pysdk._parse.load_attributes
- nd2._pysdk._parse.load_text_info
- nd2._pysdk._parse.load_global_metadata
- nd2._pysdk._parse.load_metadata
- nd2._pysdk._parse.load_frame_metadata
reading of image data itself is probably the simplest... it pulls the file offset for the requested frame index from the frame offsets (keys in the chunk map that start with b"ImageDataSeq|"), and then returns a numpy array from the memmaped file at the corresponding offset. If the frame is compressed, it uses zlib.decompress on the data

Current challenges

The main thing still going wrong is an error in _parse.load_exp_loop that leads to an error in .experiment() method for the following files:

tests/data/dims_p4z5t3c2y32x32-tile2x3.nd2
tests/data/with_binary_and_rois.nd2
tests/data/JOBS_Platename_WellA02_ChannelWidefield_Green_Seq0001.nd2
tests/data/dims_p4z5t3c2y32x32.nd2
tests/data/001-overview_crop.nd2
tests/data/JOBS_Platename_WellB01_ChannelWidefield_Green_Seq0003.nd2
tests/data/JOBS_Platename_WellA01_ChannelWidefield_Green_Seq0000.nd2
tests/data/JOBS_Platename_WellB02_ChannelWidefield_Green_Seq0002.nd2
tests/data/ML_06_72_ni_all8-MaxIP.nd2
tests/data/dims_p1z5t3c2y32x32.nd2
tests/data/dims_p4z5t3c2y32x32-Zlambda.nd2
tests/data/dims_p2z5t3-2c4y32x32.nd2

all of these files are v3.0 ... so it's quite possible that some of the logic in my experiment parsing code is "accommodating" the format of legacy xml format data in a way that effectively breaks the newer metadata structure.

it's been a few months now though since I worked on this. I feel like it was a rather small bug that is closed to being fixed, but i can't quite remember where it was at this point

tlambert03 / nd2

feat: [wip] native implementation of SDK #122