Switch to using mashumaro and orjson for parsing raw measurements and validating/generating the dataclasses, which gives a ~30x performance boost (see: https://github.com/hellais/oonidata-bench)
Consume the measurements from the cans instead of the public JSONL files so as to preserve a valid measurement_uid and avoid producing duplicate measurements (https://github.com/ooni/backend/issues/613)
Make the API for iterating over measurements better
Add more precise progress bar
Where possible, yield the measurements while streaming instead of having to decompress and keep the whole can in memory
Make the API interface a bit more ergonomic to make it faster to use in live environments like jupyter
Native support for parallel processing of measurements
Several bug fixes to the parsers and dataformats by having more strict checks on the consistency of data
Add support for extracting more metadata related to resolver information
Nicer API for generating observations directly from a parsed base measurement