huggingface / datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Apache License 2.0
2.05k stars 147 forks source link

example for decontamination #269

Open jordane95 opened 3 months ago

jordane95 commented 3 months ago

Hi, could you add an example to show how to use the decontamination pipeline? Thanks