hardik-khandala / String-Similarity-Using-NLP

0 stars 0 forks source link

πŸ“š Text Similarity Analysis using Spacy and Sklearn

This project demonstrates how to preprocess text and calculate text similarity using Spacy and Sklearn libraries. The process involves lemmatizing the text and using the CountVectorizer to transform the text data into vectors. Cosine similarity is then calculated to determine the similarity between different texts.

Notebook Contents

  1. πŸ“¦ Importing Libraries
  2. πŸ“ Text Preprocessing
  3. πŸ”’ Vectorization
  4. πŸ“Š Similarity Calculation
  5. πŸ“ˆ Visualization

πŸ“¦ Importing Libraries

The necessary libraries are imported, including Spacy for natural language processing, Sklearn for vectorization and similarity calculation, and Matplotlib for visualization.

πŸ“ Text Preprocessing

The text is first lemmatized using Spacy to reduce words to their base forms.

πŸ”’ Vectorization

Using CountVectorizer, the lemmatized text is transformed into vectors.

πŸ“Š Similarity Calculation

Cosine similarity is calculated to determine the similarity between the two text vectors.

πŸ“ˆ Visualization

You can visualize the similarity matrix using Matplotlib.

Example Output

When comparing the two example texts:

  1. Text 1: "Thor is eating Pizza."
  2. Text 2: "Loki is eating pizza."

The cosine similarity matrix will look like this:

1

When comparing the two example of Non-similar texts:

  1. Text 1: "Thor is eating Pizza."
  2. Text 2: "Loki is Traveling."

The cosine similarity matrix will look like this:

2