kitsamho / Instagram_Scraper_Graph

Two Python classes that facilitate scraping of Instagram posts and graph modelling of hashtag data
30 stars 17 forks source link

There are two Python files here, each containing a custom Instagram class.

You will need to have an Instagram account to use InstagramScraper()

InstagramScraper()

This class is made up of a series of methods that allow for the scraping of Instagram post data. The pipeline consists of three main methods that need to be called sequentially. There is no current method to chain the whole pipeline.

self.logIn() : user detail capture, WebDriver initialisation, Instagram log in.

self.getLinks() : gets n unique links containing <#HASHTAG> using WebDriver.

self.getData() : implements multi-threaded scraping of data from self.getLinks using a combination of Selenium WebDriver and Beautiful Soup. Method returns a pandas DataFrame

InstagramGraph()

csv,source_col='searched_for',post_col='post',user_col='user'


This class is made up of a series of methods that take the DataFrame from InstagramScraper(). The methods below need to be called sequentially.T here is no current method to chain the whole pipeline.

self.getFeatures(translate=False): creates various descriptive metrics from the data.

self.selectData(english=True,remove_verified=True,max_posts=3,lemma=True):Subsets the data across various variables.

self.buildGraph(additional_stopwords=[],min_frequency=5): generates edges and nodes and adds them to an instance of a NetworkX graph object.

self.plotGraph(sizing=75,node_size='adjacency_frequency',layout=nx.kamada_kawai_layout,light_theme=True,colorscale='Viridis',community_plot=False):

self.plotCommunity(colorscale=False): creates a sunburst plot of communities and contributing hashtags

self.savePlot(plot='map'): saves plots as HTML to local directory. Use 'community' if community sunburst plot needs to be saved

self.saveTables(): saves all csv files to local directory - node DataFrame, edge DataFrame, initial processed DataFrame and selected DataFrame