Closed akaanirban closed 3 years ago
This PR adds 3 note books and benchmarking results between scikit-learn (baseline), Apache Spark and cuML+Dask pipelines for NLP data processing.
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
To Check: Why cuML + Dask is slower without intermediate persists/compute_chunk_sizes ?
This PR adds 3 note books and benchmarking results between scikit-learn (baseline), Apache Spark and cuML+Dask pipelines for NLP data processing.