prathimacode-hub / DS-ScriptsNook

🎊One Stop Destination to get acquainted with scripts in Data Science. Turn yourself into a pro. Show your support by ✨ this repository.
https://prathimacode-hub.github.io/DS-ScriptsNook/
MIT License
11 stars 25 forks source link

TEXT SUMMARIZATION #16

Closed RaghuMadhavTiwari closed 2 years ago

RaghuMadhavTiwari commented 2 years ago

I would like to add a text summarization model using NLP

prathimacode-hub commented 2 years ago

Add the issue template and fill up the details according to your issue to get it assigned. @RaghuMadhavTiwari

RaghuMadhavTiwari commented 2 years ago

PROJECT TITLE - EXTRACTIVE TEXT SUMMARIZATION ALGORITHM USING NLP

INTRODUCTION - We attempt to summarize articles by selecting a subset of words that retain the most important points.

PURPOSE - Want to contribute to text summarization in NLP. This a very easy-to-understand NLP technique and good beginner friendly.

BRIEF EXPLANATION - Summarization can be defined as the task of producing a concise and fluent summary while preserving key information and overall meaning.

WORKING CONDITIONS-

  1. Input document
  2. sentences similarity
  3. weight sentences
  4. select sentences with a higher rank.

USAGE -This approach weights the important part of sentences and uses the same to form the summary. Different algorithms and techniques are used to define weights for the sentences and further rank them based on importance and similarity among each other.

USE CASES - Here are a few use cases:

  1. In scientific paper summarization, there is a considerable amount of information such as cited papers and conference information which can be leveraged to identify important sentences in the original paper.
  2. Even legal summarization works in a similar manner. Legal text summarization is a process of generating summaries from court judgments.
  3. Scanning text documents and summarizing them by identifying key entities in the document. A popular use case resume categorization, wherein the NER processes a large number of resumes and highlights key entities such as name, institution, and skills, which facilitates quick evaluation.

LIBRARIES USED -

  1. NLTK: The Natural Language Toolkit library (NLTK) is one of the most popular Python libraries for natural language processing
  2. NLTK corpora: NLTK allows users to access over 50 corpora(a large body of text/linguistic data) and lexical(database containing corpus/ dictionaries) resources.
  3. Tokenize: splits a document into a list of units. These units could be words, alphabets, or sentences.
  4. WordNet Lemmatizer: WordNet is a lexical database of English that is freely and publicly available. As part of WordNet, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing distinct concepts. These synsets are interlinked using lexical and conceptual semantic relationships. It can be easily downloaded, and the nltk library offers an interface to it that enables you to perform lemmatization.

ADVANTAGES - Here are a few advantages:

  1. Summarizing reduces perusing time.
  2. While investigating reports, outlines make the determination procedure simpler.
  3. Summarization improves the adequacy of ordering.
  4. Summarization calculations are less one-sided than human summarizers.

DISADVANTAGES - some sentiments of the text may be missed.

APPLICATIONS -

  1. scientific paper summarization
  2. Legal text summarization
  3. Scanning text documents and summarizing them by identifying key entities in the document.
  4. resume categorization view here for more applications of the concept

CONCLUSION - We will learn to build a basic text summarizer that uses the extractive summarization technique.

REFERENCES - https://python.gotrained.com/lexical-resources-nltk/ https://github.com/WING-NUS/scisumm-corpus https://github.com/EdinburghNLP/XSum

NAME - Raghu Madhav Tiwari LINKEDIN-linkedin.com/in/raghumadhavtiwari Github-https://github.com/RaghuMadhavTiwari

prathimacode-hub commented 2 years ago

Hey, this is README template, not issue template. Here is the link : https://github.com/prathimacode-hub/DS-ScriptsNook/blob/main/.github/issue_template/feature_request.md

Do the changes for this issue template. Save the above file for your README.md @RaghuMadhavTiwari

RaghuMadhavTiwari commented 2 years ago
Title | About | Name | Label | Assignee -- | -- | -- | -- | -- EXTRACTIVE TEXT SUMMARIZATION USING NLP | This script will contain a simple model to extractive text Summarization | Raghu Madhav Tiwari| This is a model on text summarization using NLP. It is for beginners in the field who would like to get familiar with NLP and understand how text summarization actually works. |  -

Define You:

Is your feature request related to a problem?

Not finding helpful resources on NLP, that covers concepts and a hands on project as well

Describe the solution you'd like...

This project should help beginers in the field of NLP understand how extractive text summarization works, along with few basic yet important concept

Describe alternatives you've considered?

NA

Approach to be followed (optional):

Additional context

Scanning text documents and summarizing them by identifying key entities in the document. A popular use case is resume categorization, wherein the NER processes a large number of resumes and highlights key entities such as name, institution, and skills, which facilitates quick evaluation
prathimacode-hub commented 2 years ago

Issue assigned. @RaghuMadhavTiwari

prathimacode-hub commented 2 years ago

title: Project Title about: Suggest an idea for this project name: Your Name label: Feature Request Assignee: ''


Define You:

Is your feature request related to a problem? Please describe.

Describe the solution you'd like...

Describe alternatives you've considered?

Approach to be followed (optional):

Additional context

prathimacode-hub commented 2 years ago

This is how it should look like, now fill the details here according to things asked. I had already enable the devincept participant option, since you didnt do it earlier. @RaghuMadhavTiwari

RaghuMadhavTiwari commented 2 years ago

title: Extractive Text Summarization using NLP about: This script will contain a simple model to extractive text Summarization name: Raghu Madhav Tiwari label: This is a model on text summarization using NLP. It is for beginners in the field who would like to get familiar with NLP and understand how text summarization actually works. Label : Feature Request Assignee: ''

Define You:

Is your feature request related to a problem? Not finding helpful resources on NLP, that cover concepts and a hands-on project as well.

Describe the solution you'd like... This project should help beginners in the field of NLP understand how extractive text summarization works, along with a few basic yet important concept

Describe alternatives you've considered? NA

Approach to be followed (optional):

Additional context Scanning text documents and summarizing them by identifying key entities in the document. A popular use case resumes categorization, wherein the NER processes a large number of resumes and highlights key entities such as name, institution, and skills, which facilitates quick evaluation

prathimacode-hub commented 2 years ago

Issue assigned. @RaghuMadhavTiwari