rishibubna / Customer_identification_using_Web_based_techniques

0 stars 1 forks source link

High-value consumer profiling using a hybrid text-based and web-based approach

===== Project Description =====

In this project, we propose an approach to find out new potential consumers for a specific product or service offering using the social media presence of consumers. Using a hybrid, text-based and web-based approach we aim to identify and rank high-value customers using text-analysis, network-analysis and machine learning. This allows businesses to better design customer engagement programs targeting the right social media audience that are most likely to convert into consumers, thus improving the efficiency of customer acquisition, and increasing return on investment.

The approach was demonstrated by gathering tweets for Tesla, several non-target accounts, 405 followers of Tesla and a total of 7570 neighbors. A text-based decision tree ensemble model classified tweets with a F1-accuracy of 0.984. The web-based approaches are compared against the text-based approach, where the proposed relational classifier outperforms the community-approach in capturing the text-based classifications in absence of enough text.

The following are required installations:

===== Installation =====

  1. Python: Can be downloaded from https://www.python.org/downloads/

  2. Jupyter Notebooks: Link for downloading: https://jupyter.org/install

  3. Related libraries:

  4. Twitter API: Request API keys for gathering twitter data from Developer documents: https://developer.twitter.com/en

===== Running the Scripts =====

  1. Gathering Tweets: Target and Non-Target accounts: run tweets_api.py for gathering tweets for 10 twitter accounts: 'tesla', 'ladygaga', 'usedgov', 'FoodandTravelEd', 'nytimes', 'premierleague', 'MTV','facebook', 'eBay', 'parenting'. These are saved in the folder ‘tweets_raw’.
Note: Put in your corresponding API keys, we have removed the API key
  1. Text Based Approach: Run the following files-
    • Data_Cleaning.ipynb
    • Performs cleaning of tweets and stores them in ‘cleaned_csv’
    • Generates tf-idf matrices for target and non target accounts and stores them in ‘tfidf_matrices’ folder
  1. Web Based Approach- Community:

    • Web based approach.ipynb - calculates score for first and second level users
  2. Web Based Approach: Relational Classifier:

    • Homophily_Relational.ipynb - Performs the Test of Homophily and Relational classification_techniques