pearcetm / svs-deidentifier

Strip potentially identifying label and macro images from Aperio SVS whole slide image files
MIT License
25 stars 4 forks source link

Feature Request: Add capability to redact scanner and file metadata #18

Open Badger1919 opened 4 months ago

Badger1919 commented 4 months ago

ScanScope scanners write metadata to the svs file that includes the scanscope ID, filename, title, scan date, scan time, time zone, and user identifier. If someone used identifying information in the filename or title when originally scanning the slide, this information is still present after running it through svs-deidentifier. It's important to note that changing the filename after scanning does not change the filename as stored in this metadata.

It's possible to view this information in Aperio ImageScope through the "Image -> Information" dropdown menu. It is also possible to view and overwrite the information by opening the svs file in a hex editor.

I've pasted a copy of this identifying portion of metadata (parts redacted with "X") as seen in a hex editor from an svs file. Note that this data shows up twice in the svs file. It's necessary to redact both instances. This is not the only metadata stored in the svs file, this is only what I thought could be potentially identifying.

ScanScope ID = SSXXXX|Filename = XXXXXX|Title = XXXXXXX|Date = 04/30/24|Time = 09:32:59|Time Zone = GMT-05:00|User = 00000000-0000-0000-0000-000000000000

pearcetm commented 4 months ago

@Badger1919 Indeed, as you point out, there can be potentially identifiable metadata that is separate from the label/macro images. To what degree this is actually identifiable depends on how the scanner was set up (often the file name has nothing related to patient info, but it could).

I am currently working on a tool that will work similarly to this one, but with enhanced capabilities, including scrubbing the internal metadata fields for such info. Rather than incorporating those changes here, I'll be deprecating this project once that one is ready for release.

Badger1919 commented 4 months ago

That's great news! Is there a repository for it that we can follow so we can know when it's ready?

pearcetm commented 4 months ago

It is part of a larger project, and the development repository is private for now as part of that overall project. I'm hoping to have something ready in the next few months. Feel free to ping me here periodically if you'd like.