Closed jamesomina99 closed 3 years ago
@jamesomina99 Good afternoon and thank you for submitting your topic suggestion. Your topic form has been entered into our queue and should be reviewed (for approval) as soon as a content moderator is finished reviewing the ones in the queue before it.
Hello, @jamesomina99.This article is a super helpful and important topic. Please start working on the topic proposed. Let's ignite all the boosters on this one and deliver superior value to the reader.
Let's be sure to provide value that we as developers feel is scarcely available out there on the web. Custom projects would be the best way to explain such concepts. We avoid projects and explanations easily available on documentation sites and blogs.
Cheers, Lalith
Proposed title of article
How to build Multiclass Text Classification model with PySpark
Introduction paragraph (2-3 paragraphs):
PySpark is an interface for Apache Spark in Python. It allows us to write Spark applications using Python APIs and provides the PySpark shell for interactively analyzing your data in a distributed environment. PySpark supports most of Spark’s features such as Spark SQL, DataFrame, Streaming, MLlib (Machine Learning), and Spark Core. We shall use the Pyspark.ML API which is based on the DataFrame API to build our text classification app. In the tutorial, we will use PySpark to create a pipeline to analyze our dataset and create a classifier app. The text classifier that we build in this tutorial can predict the subject category of udemy courses based on what the user inputs. Spark Machine Learning Pipelines API includes three steps:
Key takeaways:
References:
Please list links to any published content/research that you intend to use to support/guide this article.
Templates to use as guides