rapidsai / rapids-examples

33 stars 24 forks source link

Results and notebooks for benchmarks using hashing vectorizer (cuML+Dask vs Apache Spark) #31

Closed akaanirban closed 3 years ago

akaanirban commented 3 years ago

This PR adds 3 note books and benchmarking results between scikit-learn (baseline), Apache Spark and cuML+Dask pipelines for NLP data processing.

review-notebook-app[bot] commented 3 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

akaanirban commented 3 years ago

To Check: Why cuML + Dask is slower without intermediate persists/compute_chunk_sizes ?