section-engineering-education / engineering-education

“Section's Engineering Education (EngEd) Program is dedicated to offering a unique quality community experience for computer science university students."
Apache License 2.0
363 stars 889 forks source link

[Machine Learning] Building natural language processing with BERT and Pytorch #7276

Closed francis966 closed 2 years ago

francis966 commented 2 years ago

Proposal Submission

Proposed title of article

[Machine Learning] Building natural language processing with BERT and Pytorch

Proposed article introduction

Natural Language Processing (NLP) is a field of artificial intelligence (AI) that enables computers to analyze and understand human language, both written and spoken. It was formulated to build software that generates and comprehends natural languages so that a user can have natural conversations with a computer. By utilizing NLP, developers can organize and structure knowledge to perform tasks such as automatic summarization, translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation.

BERT stands for “Bidirectional Encoder Representation with Transformers”. BERT is an open-source machine learning framework for natural language processing (NLP). It is designed to help computers understand the meaning of ambiguous language in the text by using surrounding text to establish context.

BERT extracts patterns or representations from the data or word embeddings by passing them through an encoder. The encoder itself is a transformer architecture that is stacked together. It is a bidirectional transformer which means that during training it considers the context from both left and right of the vocabulary to extract patterns or representations.

PyTorch is an open-source machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab. PyTorch defines a class called Tensor (torch.Tensor) to store and operate on homogeneous multidimensional rectangular arrays of numbers.

PyTorch-NLP comes with pre-trained embeddings, samplers, dataset loaders, metrics, neural network modules, and text encoders. In this tutorial, we are going to implement sentiment analysis for Amazon Reviews using BERT from the 🤗 Huggingface Transformers library and Pytorch.

Key takeaways

  1. how text is processed for natural language processing tasks.
  2. Installing the HuggingFace BERT transformers.
  3. Installing Pytorch
  4. Importing Pytorch's Optim and nn modules.
  5. Create DistilBert Tokenizer
  6. Create Pytorch Dataset and Dataloader
  7. Model Creation (Amazon Reviews sentiment analysis)
  8. Making predictions.

Article quality

In this tutorial, we will discuss all the text preprocessing steps in detail before building our model. This will clean our data and make it ready to be used by the BERT model. We will explain how to install the 🤗 Huggingface Transformers and how to select BERT from the pre-trained model. We will also explain how to import Pytorch's Optim and nn modules. These are the key modules that we will use to build the model.

We will then create a Pytorch dataset and data loader which we will use to fine-tune our sentiment analysis model. We will use the final model to make predictions, this model can be used to make right and accurate predictions.

References

Please list links to any published content/research that you intend to use to support/guide this article.

Conclusion

Finally, remove the Pre-Submission advice section and all our blockquoted notes as you fill in the form before you submit. We look forwarding to reviewing your topic suggestion.

Templates to use as guides

github-actions[bot] commented 2 years ago

👋 @francis966 Good afternoon and thank you for submitting your topic suggestion. Your topic form has been entered into our queue and should be reviewed (for approval) as soon as a content moderator is finished reviewing the ones in the queue before it.

hectorkambow commented 2 years ago

Good afternoon and thank you for submitting your topic to the EngEd program.

After some careful consideration it struck us that this topic may be a bit over saturated throughout other blog sites and official documentations. We typically refrain from publishing content that is covered widely on the net or other blogs. We're more interested in original, practitioner-focused content that takes a deeper dive into programming-centric concepts.

But in order to approve the topic it has to serve value to the larger developer community at large. Please feel free to suggest an alternate topic to explore. 🚀 An option and a great way to write this as an in-depth article and make it more add value to the greater developer community at large would be to walk the reader through the USE of methods and functions by building a unique, different, useful project.

If you believe your article is unique enough - please present your case to explain how it will be different from what is currently out there?

https://www.google.com/search?q=natural+language+processing+with+BERT+and+Pytorch&rlz=1C1CHBF_enUS891US891&sourceid=chrome&ie=UTF-8