When running the 2023 analysis, the unit tests pass but risk_change() did not deduplicate formats correctly with real data. To complete the analysis, I used Excel to remove duplicates and it also did not remove duplicates correctly. I had to review and merge/delete many rows by hand. JPEG EXIF had a lot of versions that did not remove duplicates, and there were other formats as well. I suspect there is something about the data, like type, that is different but not visible when looking at the files.
When running the 2023 analysis, the unit tests pass but risk_change() did not deduplicate formats correctly with real data. To complete the analysis, I used Excel to remove duplicates and it also did not remove duplicates correctly. I had to review and merge/delete many rows by hand. JPEG EXIF had a lot of versions that did not remove duplicates, and there were other formats as well. I suspect there is something about the data, like type, that is different but not visible when looking at the files.