healtex / texscrubber

Personal information de-identification tool
Apache License 2.0
2 stars 2 forks source link

Gazetteer cleanup listener #26

Closed hkkenneth closed 7 years ago

hkkenneth commented 7 years ago

For #25

mbelousov commented 7 years ago

@hkkenneth, great! Well done! Just thinking, do we need to have something similar to clean output directory before job execution (to remove outputs from previous run)? @healtex/deid

hkkenneth commented 7 years ago

It can be easily done with a beforeJob() method with an almost identical implementation - do we have any use cases which we may want to keep the files from the previous run?

hkkenneth commented 7 years ago

On a different note, I think in the future, we could have a configuration to let advanced users keep the gazetteer files after the job: to inspect them, or to use the files to customize their own gazetteers?

mbelousov commented 7 years ago

@hkkenneth For the first version (consider the desktop application) we could just use name of the input folder and then create corresponding folders: {name}-gazetteers (for step1) and {name}-results (for step2) in that case we will be able to run multiple instances of app (e.g. processing two datasets at the same time). Agree that it might be a good idea to keep gazetteers, so we can a corresponding option to our workflow and in the desktop app we could have a checkbox.

hkkenneth commented 7 years ago

What's {name} here? But can't users already achieve the same effect with separate workspace?

mbelousov commented 7 years ago

@hkkenneth , yep, workspace sounds like a better option.