rapidsai-community / notebooks-contrib

RAPIDS Community Notebooks
Apache License 2.0
512 stars 266 forks source link

RAPIDS Community Contrib


Table of Contents


Introduction

Welcome to the community contributed notebooks repo! (formerly known as Notebooks-Extended)

The purpose of this collection is to introduce RAPIDS to new users by providing useful jupyter notebooks as learning aides. This collection of notebooks are direct community contributions by the RAPIDS team, our Ecosystem Partners, and RAPIDS users like you!

What do you mean "Community Notebooks"

These notebooks are for the community. It means:

  1. YOU can contribute workflow examples, tips and tricks, or tutorials for others to use and share! We ask that you follow our Testing and PR process.
  2. If your notebook is awesome, your notebook can be featured

There are some additional Community Responsibilities, as the RAPIDS team isn't maintaining these notebooks

RAPIDS Showcase Notebooks

These notebooks are built by the RAPIDS team and will be maintained by them. When we remove the notebooks, it will become community maintained until it hits the_archive

RAPIDS Event Notebooks

These notebooks that we presented at conferences or meetups. While we strive to use open source or easily accessible data, some notebooks may require datasets that have restricted access. They also will be frozen in time and not maintained as RAPIDS progresses. Please download the appropriate RAPIDS version that these workflows were build on or expect to update them to the newer verisons. Your favorite notebooks from our previous events can now be found there as well!

How to Contribute

Please see our guide for contributing to notebooks-contrib.

Once you've followed our guide, please don't forget to test your notebooks! before making a PR.

Exploring the Repo

Folders

Great places to get started

Topics

Click each topic to expand

RAPIDS Libraries Basics #### Teaching Notebooks and User Guides * [Intro to RAPIDS Crash Course](getting_started_materials/README.md) * [Intro Notebooks to RAPIDS](getting_started_materials/intro_tutorials_and_guides)- covers cuDF, Dask, cuML and XGBoost. * [Official RAPIDS User Guides](https://docs.rapids.ai/user-guide) * [10 Minutes to cuDF and Dask cuDF](https://docs.rapids.ai/api/cudf/stable/user_guide/10min/) * [cuDF for Data Scientists: Functions for Data Wrangling (External)](https://medium.com/@tiraldj/cudf-for-data-scientists-part-1-2-functions-for-data-wrangling-12a8f889b33e#e7ee) - by [Mohammed R. Osman]() * [Learn RAPIDS Getting Started Tour (External)](https://github.com/RAPIDSAcademy/rapidsacademy/tree/master/tutorials/datasci/tour) * [Hello Worlds](getting_started_materials/hello_worlds) #### Official Cheat Sheets * [cuDF Cheat Sheet (PDF Download)](https://forums.developer.nvidia.com/uploads/short-url/mIndAvHNud3UXeWwC7Ore3d021D.pdf) * [BlazingSQL Cheat Sheet (PDF Download)](https://forums.developer.nvidia.com/uploads/short-url/v0Wt2kUisxHUwr9fJSD6yA1J2bP.pdf) * [cuGraph Cheat Sheet (PDF Download)](https://forums.developer.nvidia.com/uploads/short-url/kIbMG6LZjFfLFibbyqvVl2XcSbB.pdf) * [RAPIDS-Dask Cheat Sheet (PDF Download)](https://forums.developer.nvidia.com/uploads/short-url/xiN07MC8FSHsXS6lekxSaY1CWs4.pdf) * [CLX and cyBert Cheat Sheet (PDF Download)](https://forums.developer.nvidia.com/uploads/short-url/edzS5WizVTYZMWRtTl3AqHI5AL4.pdf) * [cuSignal Cheat Sheet (PDF Download)](https://forums.developer.nvidia.com/uploads/short-url/hkh6vQ2rzl6mAHL8Vt0CYhctark.pdf)
Deploying RAPIDS * [Official RAPIDS Deployment Guide](Deploying RAPIDS — RAPIDS Deployment Documentation documentation) * [Video- Tutorial of RAPIDS on AWS Sagemaker](https://www.youtube.com/watch?v=BtE4d0v6Css) * [Video- Tutorial of RAPIDS on AzureML](https://www.youtube.com/watch?v=aqTmVVFnEwI) * [Bursting Data Science Workloads to GPUs on Google Cloud Platform with Dask Cloud Provider (Blog with Code snippets)](https://medium.com/rapids-ai/bursting-data-science-workloads-to-gpus-on-google-cloud-platform-with-dask-cloud-provider-685be1eff204) * [Step by Step - Tutorial of RAPIDS on IBM Virtual Server Instance](https://medium.com/@ahmed_82744/deploy-rapids-on-ibm-cloud-virtual-server-for-vpc-ce3e4b3ede1c)- by [Muhammad Arif](https://www.linkedin.com/in/arifnafees/) in collabaration with [Syed Afzal Ahmed](https://www.linkedin.com/in/syed-ahmed-6927749/) * [Step by Step - Tutorial of RAPIDS on IBM Kubernetes Service](https://medium.com/@ahmed_82744/deploy-rapids-on-ibm-cloud-kubernetes-service-920de68dc6c4)- by [Muhammad Arif](https://www.linkedin.com/in/arifnafees/) in collabaration with [Syed Afzal Ahmed](https://www.linkedin.com/in/syed-ahmed-6927749/)
Multi GPU #### Getting Started * [Hello Word to Dask](getting_started_materials/hello_worlds/Dask_Hello_World.ipynb) * [Intro to Dask](getting_started_materials/intro_tutorials_and_guides/03_Introduction_to_Dask.ipynb) * [Dask using cuDF](getting_started_materials/intro_tutorials_and_guides/04_Introduction_to_Dask_using_cuDF_DataFrames.ipynb) * [Learn RAPIDS Multi GPU Mini Tour (External)](https://github.com/RAPIDSAcademy/rapidsacademy/tree/master/tutorials/multigpu/minitour) #### Example Workflows * [NYC Taxi on Dataproc (or Local)](https://github.com/rapidsai-community/notebooks-contrib/blob/main/community_tutorials_and_guides/taxi/NYCTaxi-E2E.ipynb) * [Weather Analysis](community_tutorials_and_guides/intermediate_notebooks/examples/weather.ipynb) * Dask Mortgage Analysis * Performance Mortgage Analysis * [State of the art NLP at scale with RAPIDS, HuggingFace and Dask (Blog and Code)](https://medium.com/rapids-ai/state-of-the-art-nlp-at-scale-with-rapids-huggingface-and-dask-a885c19ce87b) * [LearnRAPIDS Multi-GPU Mini Tour (External)](https://github.com/RAPIDSAcademy/rapidsacademy/tree/master/tutorials/multigpu/minitour) #### Dask Tricks * [Monitoring Dask RAPIDS with Prometheus and Grafana (Blog with Code)](https://medium.com/rapids-ai/monitoring-dask-rapids-with-prometheus-grafana-96eaf6b8f3a0) * [Scheduling & Optimizing RAPIDS Workflows with Dask and Prefect (Blog and Code)](https://medium.com/rapids-ai/scheduling-optimizing-rapids-workflows-with-dask-and-prefect-6fc26d011bf) * [Filtered Reading with RAPIDS & Dask to Optimize ETL (Blog and Code)](https://medium.com/rapids-ai/filtered-reading-with-rapids-dask-to-optimize-etl-5f1624f4be55)
RAPIDS and Deep Learning * [Official RAPIDSAI Deep Learning Repo](https://github.com/rapidsai/deeplearning) * [GPU Hackthons RAPIDS + Deep Learning Crash Course](https://github.com/gpuhackathons-org/gpubootcamp/blob/master/ai/RAPIDS/) * [deeplearningwizard.com's Wizard Tutorial](https://github.com/ritchieng/deep-learning-wizard/) (External, uses Google Colab)
Data Visualizations with RAPIDS #### Offical RAPIDS Demos * [Intro to cuXFilter](https://github.com/rapidsai-community/showcase/blob/main/team_contributions/cuxfilter-tutorial/cuxfilter_tutorial.ipynb) * [Spatial Analytics Viz](https://github.com/exactlyallan/Spatial-Analytics-Viz/tree/main) #### Tutorials * [Visual EDA on NYC Taxi Spatial Analytics (As Shown in PyDataDC Meetup 11/2020)](https://github.com/taureandyernv/rapidsai_visual_eda) * [RAPIDS + Plot.ly Dask Tutorial (As shown in PyDataTT on 05/2021)](https://github.com/taureandyernv/rapids-plotly-webapps/tree/main).
Streaming Data * [Chinmay Chandak's cuStreamz Gists (External)](https://gist.github.com/chinmaychandak) * [Using cuStreamz to Accelerate your Kafka Datasource (Blog)](https://medium.com/rapids-ai/the-custreamz-series-the-accelerated-kafka-datasource-4faf0baeb3f6) * [GPU accelerated Stream processing with RAPIDS (Blog)](https://medium.com/rapids-ai/gpu-accelerated-stream-processing-with-rapids-f2b725696a61) * [Hello World Streaming Data](getting_started_materials/hello_worlds/hello_streamz.ipynb)
NLP * [NLP with Hashing Vectorizer (Blog)](https://medium.com/rapids-ai/gpu-text-processing-now-even-simpler-and-faster-bde7e42c8c8a) * [Show me the Word Count (Archives)](the_archive/archived_rapids_blog_notebooks/nlp/show_me_the_word_count_gutenberg)
Graph Analytics
GIS/Spatial Analytics * [Seismic Facies Analysis (External)](https://github.com/NVIDIA/energy-sdk/tree/master/rapids_seismic_facies)
Genomics * [Clara Parabricks Single Cell Analytics Repo](https://github.com/clara-parabricks/rapids-single-cell-examples) - [Notebooks](https://github.com/clara-parabricks/rapids-single-cell-examples/tree/master/notebooks) * [RAPIDS Single Cell Analytics with updated scanpy wrappers](https://github.com/Intron7/rapids_singlecell) - by [Severin Dicks](https://github.com/Intron7) ([Institute of Medical Bioinformatics and Systems Medicine](https://www.uniklinik-freiburg.de/institut-fuer-medizinische-bioinformatik-und-systemmedizin/englisch/en.html), Freiburg) * [Video - GPU accelerated Single Cell Analytics](https://www.youtube.com/watch?v=nYneL_uif3Q) * [Video - Accelerate and scale genomic analysis with open source analytics](https://cloudonair.withgoogle.com/events/genomic-analysis) (Free Google registration required)
Cybersecurity * [RAPIDS CLX](https://docs.rapids.ai/api/clx/stable/) * [CLX API Docs](https://docs.rapids.ai/api/clx/stable/api.html) * [10 Minutes to CLX](https://docs.rapids.ai/api/clx/stable/10min-clx.html) * [Getting Started with CLX and Streamz](https://docs.rapids.ai/api/clx/stable/intro-clx-streamz.html) * [Learn RAPIDS Cyber Security Mini Tour (External)](https://github.com/RAPIDSAcademy/rapidsacademy/tree/master/tutorials/security/tour) * [Cyber Blog Notebooks (Archives)](the_archive/archived_rapids_blog_notebooks/cyber)
Past Competitions - [RAPIDS.AI KGMON Competition Notebooks](the_archive/archived_competition_notebooks/kaggle)- contains a selection of notebooks that were used in Kaggle competitions.
Benchmarks * [MultiGPU PageRank Benchmark (Archived)](the_archive/archived_rapids_benchmarks/cugraph) * [RAPIDS Decomposition (Archived)](the_archive/archived_rapids_benchmarks/rapids_decomposition.ipynb)
Random Tips and Tricks * [Synthetic 3D End-to-End ML Workflow](community_tutorials_and_guides/synthetic) * [Reading Larger than Memory CSVs with RAPIDS and Dask (Blog)](https://medium.com/rapids-ai/reading-larger-than-memory-csvs-with-rapids-and-dask-e6e27dfa6c0f)

How-Tos with our Ecosystem Partners

LearnRAPIDS * [Main Website](https://www.learnrapids.com/) * [Tutorial Github Repo](https://github.com/RAPIDSAcademy/rapidsacademy/tree/master/tutorials)
Graphistry * [Graph viz/connectors/transforms for cuGraph/cuDF with Demos](https://github.com/graphistry/pygraphistry) - Demos in /demos * [RAPIDS dashboarding with Graphistry with Demos](https://github.com/graphistry/graph-app-kit) - Various demos in /python/views * [Graphistry Hub](https://hub.graphistry.com/) - Includes no-code file uploader + free API keys

Additional Resources

Beyond our Official RAPIDS Docs, please:

Additional Information