hmoazam / Microsoft-Social-Media-Analytics

MIT License
11 stars 5 forks source link

Social Media Analytics Accelerator

Introduction

In today's society, the use of social media has become a necessary daily activity. For companies and organizations around the world, monitoring social media has become mandatory. It is an essential tool for innovation and marketing. Its benefits range from the ability to get instant feedback on products and services, allowing better relations with end users and customers, provide a means to increase user satisfaction very quickly, and keep up with the competition by detecting and exploiting opportunities that the competitors may be missing.

The Social Media Accelerator provides the skeleton for building a Social Media monitoring platform that helps collect data from social media sites and news sites (via News API and RSS), and evaluate that data to make business decisions. This document gives an overview of the solution architecture and provides all the necessary requirements and information to deploy the solution, as well as ideas and scenarios for extending the solution.

Architecture

The architecture of the Social Media Analytics accelerator is depicted below. The accelerator uses the following components:

auto generated

Deployment

The deployment of the accelerator is fully automated using a set of Bicep and PowerShell files. The following requirements must be in place before running the deployment:

auto generated

auto generated

auto generated

The result of this step should look like the following:


Tables, schemas, stored procedures and table data Notebooks, pipelines and triggers, ready for customization deployed to SQL Pool


automatically confidence automatically confidence2

Customization

The accelerator comes with some sample queries around football and covid, and is by default using queries and search terms specific to these topics. It is necessary to customize the accelerator with the right search terms to meet with the use case in hand. The customization is performed at the pipelines' level. The accelerator uses three data sources (Twitter, NewsAPI and RSS Feeds), thus there are three pipelines to customize:

News Articles Pipeline:

This pipeline is named "News Orchestrator" and comes with multiple activities, of which the Notebook activities need to be customized. Selecting each activity, and checking the Base parameters under Settings shows the query being executed. It is important to customize both the query being executed and the topic properties of each activity. The topic is used to tag the data collected so that it can later be used to group it in the dashboard. If further categorization is needed on the dashboard, you can use subtopic. If more queries are needed, the Notebook activity can be cloned to add the query and linked to the pipeline workflow. No changes are required to the rest of the activities (starting from the Cleanup activity).

For details on building queries for the News API: https://newsapi.org/docs/endpoints/everything)

Graphical user interface, table Description automatically
generated

Tweets Pipeline

This pipeline is named "Tweets Orchestrator" and comes with multiple activities, of which the Notebook activities need to be customized. Similar to News Articles, selecting each activity, and checking the Base parameters under Settings shows the query being executed. It is important to customize both the query being executed and the topic properties of each activity. The topic is used to tag the data collected so that it can later be used to group it in the dashboard. If further categorization is needed on the dashboard, you can use subtopic. If more queries are needed, the Notebook activity can be cloned to add the query and linked to the pipeline workflow. No changes are required to the rest of the activities (starting from the Cleanup activity).

If you wish to get tweets from a specific user, instead of by query, then leave the query parameter empty, and enter the twitter handle of the user of interest in the user parameter field.

For details on constructing Twitter queries and the associated limits, refer to:

Graphical user interface, table Description automatically
generated

RSS Articles Pipeline:

This pipeline is named "RSS Orchestrator" and comes with multiple activities, of which the Notebook activities need to be customized. Selecting each activity, and checking the Base parameters under Settings shows the configuration being executed. You need to enter the RSS feed link under feed_source. You can choose to include optional or required keywords to filter the RSS feed under query_optional and query_required, respectively. The topic is used to tag the data collected so that it can later be used to group it in the dashboard. If further categorization is needed on the dashboard, you can use subtopic. If more queries are needed, the Notebook activity can be cloned to add the query and linked to the pipeline workflow. No changes are required to the rest of the activities (starting from the Cleanup activity).

Graphical user interface, table Description automatically
generated

Pipeline Triggers

After the queries have been added, the data collection can be started by starting the triggers. You can also modify the triggers to your preferred frequency.

Note: There is an intermittent issue with creating the RSS trigger. You may need to create it manually.

Graphical user interface, application, table, Excel Description
automatically
generated

Power BI Template

A Power BI template file is available to access the insights generated from the solution, and consists of an executive dashboard, a Football page, and a Health page. When the Power BI template is opened, it prompts for the data source which should be the Synapse workspace Sql endpoint. The report uses Import mode. You will need to modify the topic filters on the pages to match the topics you have configured your pipelines with. When publishing the report in powerbi.com, Scheduled refresh has to be configured to update the data in the dashboard. Below are screenshots of the dashboard. When opening the report using Power BI Desktop, make sure to activate the Azure Map visual in the Preview Features of Power BI Desktop options. Each of the topic pages contains 3 panels, one each for tweets, news and RSS. You can delete a panel if you are not pulling data from a particular source.

Executive Dashboard

Chart Description automatically
generated

Football

Graphical user interface, timeline Description automatically
generated

Health

Graphical user interface, timeline Description automatically
generated

Solution Extension

The accelerator can be extended in various ways and using numerous features that can be added to provide more functionality.

Data sources

The data sources can be extended to include more social media websites like Facebook, Instagram, Tiktok... Using a combination of Azure Eventhubs and Streaming APIs, ingestion of streaming data can be added.

CosmosDB is a component that allow great extensibility of the solution, as any news feed can integrate with the solution by dumping the data in the NoSQL database.

Additional Features

Many features can be included to provide even more insights to the end users. Enabling near real-time ingestion using data streaming can unlock capabilities such as taking fast actions based on events in social media. Also, Cognitive Services can be leveraged to perform Opinion Mining, for example tracking users' opinion on brand entities or any event organization before and after the events. It is also possible to use Graph technology to build an Influencer Network Detection system, that will detect influencers and visualize influencer's network. It is also possible to link the solution to external systems in order to perform specific actions based on pre-defined events. Also integrating the solution with a chatbot is another great way for users to access the insights provided by the solution.

Visualization

To improve the navigation of the Power BI reports and the user experience, while allowing users to browse historical data, can be achieved by adding an Azure Analysis Services tabular model layer.

An example extension of the architecture is shown below:

Graphical user interface, diagram, application Description
automatically generated