netleibi / fastchunking

Fast text chunking algorithms for Python
Apache License 2.0
12 stars 2 forks source link

Have a look at FastCDC #2

Open titusz opened 7 years ago

titusz commented 7 years ago

I have been playing around with fastchunking and it works really well. As the name suggests this lib is aimed at fast chunking. So I´d like to bring the FastCDC paper to attention. It claims to be about 10x faster than Rabin-based CDCs. While I did not verify this claim in practice I read the paper and the approach seems promising. I´d be interested in thoughts about this.

netleibi commented 7 years ago

I agree that their approach is promising and that it would be interesting to have it included in this library. I am not aware of any existing FastCDC implementation, though, as the authors have not published their source code yet, so I think it's something to have a look at in the future.