Results and notebooks for benchmarks using hashing vectorizer (cuML+Dask vs Apache Spark)

rapidsai / rapids-examples

33 stars 24 forks source link

Closed akaanirban closed 3 years ago

akaanirban commented 3 years ago

This PR adds 3 note books and benchmarking results between scikit-learn (baseline), Apache Spark and cuML+Dask pipelines for NLP data processing.

review-notebook-app[bot] commented 3 years ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

akaanirban commented 3 years ago

To Check: Why cuML + Dask is slower without intermediate persists/compute_chunk_sizes ?