Open rsa408 opened 3 years ago
@rsa408 Thanks for reporting the issue.
You don't need to add logic for login. Instagram's hashtag search is publicly accessible over this link https://www.instagram.com/explore/tags/.
I added #life and #Health in sample_config.ini and the app was able to find the hashtag volumns properly. Here is the output.
INFO : Browser opened in constructor INFO : Browser opened EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing INFO : Collected: #life EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing INFO : Collected: #Health INFO : Browser closed INFO : Printing collected hashtags and volume
#Health 128351818
Can you try once again?
I get similar result but loging in.
code works for me : Browsercontroller
from selenium import webdriver
from time import sleep
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
class BrowserController:
def __init__(self, driverpath):
# self.browser = None
# self.wait = None
self.driver_path = driverpath
self.browser = webdriver.Chrome(executable_path=self.driver_path)
self.wait = WebDriverWait(self.browser, 3)
print("INFO : Browser opened in constructor")
def browser_open(self,url,username,password):
"""
TODO : Fill docstrings
"""
self.browser.get(url)
username_input = self.browser.find_element_by_name("username")
password_input = self.browser.find_element_by_name("password")
username_input.send_keys(username)
password_input.send_keys(password)
self.browser.implicitly_wait(5)
login_button = self.browser.find_element_by_xpath('//button[@type="submit"]')
login_button.click()
sleep(2)
tempy = self.browser.find_element_by_xpath("//button[contains(text(), 'Not Now')]")
tempy.click()
sleep(2)
# Uncomment the lines below if a second pop-up appears
tempy2 = self.browser.find_element_by_xpath("//button[contains(text(), 'Not Now')]")
tempy2.click()
sleep(2)
# self.wait = WebDriverWait(self.browser, 3)
print("INFO : Browser opened")
def browser_close(self):
"""
TODO : Fill docstrings
"""
self.browser.close()
print("INFO : Browser closed")
def load_and_get(self, url):
self.browser.get(url)
def get_element_text_by_xpath(self, xpath):
try:
text = self.wait.until(EC.presence_of_element_located((By.XPATH, xpath))).text.strip()
return text
except Exception as ex:
print("EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing")
return ""
Main
from time import sleep
from configparser import ConfigParser
from hashtag import Hashtag
from browsercontroller import BrowserController
from selenium import webdriver
username = ""
password = ""
def add_hash_symbol(tag_list):
for i in range(len(tag_list)):
tag = tag_list[i]
if tag.find('#') == -1:
tag = "#" + tag
tag_list[i] = tag
def print_sorted_database(database):
"""
TODO : Fill docstrings
"""
print("INFO : Printing collected hashtags and volume")
for key, value in sorted(database.items(), key=lambda item: item[1], reverse=True):
print('\t {:20s} \t {:>10d} '.format(key, value))
def main():
# Read config file
parser = ConfigParser()
parser.read('../config.ini')
sections = ["DEFAULT", "SEEDS"]
driver_path = parser.get(sections[0], "driverpath")
baseurl = parser.get(sections[0], "BaseUrl")
num_tags = parser.get(sections[0], "numtags")
related_tag_limit = parser.get(sections[0], "RelatedTagLimits")
num_seeds = parser.get(sections[0], "numseeds")
# get seed hash tags
seeds = []
for i in range(int(num_seeds)):
seeds.append(parser.get(sections[1], "seed{:d}".format(i + 1)))
add_hash_symbol(seeds)
# browser object and hashtag objects
browser = BrowserController(driver_path)
browser.browser_open('https://www.instagram.com', username,password)
sleep(2)
database = dict()
for seed in seeds:
hashtags = Hashtag(seed, baseurl, int(num_tags), int(related_tag_limit))
database.update(hashtags.scrapping_loop(browser))
#browser.browser_close()
# Print collected hashtags
print_sorted_database(database)
if __name__ == "__main__":
main()
Result
INFO : Browser opened in constructor
INFO : Browser opened
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
INFO : Collected: #health
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
INFO : Collected: #life
INFO : Printing collected hashtags and volume
#life 360092358
#health 128383797
Process finished with exit code 0
Part 1 : Related hashtag and exception
The related hashtag logic is broken because Intagram has disabled the feature. Check this news https://www.theverge.com/2020/8/5/21355976/instagram-related-hashtags-disabled-feature-bug-trump-biden
You can read the logic of scraping the related hashtag in following lines of code : https://github.com/rahulpawargithub/Instagram-Hashtag-Finder/blob/master/src/hashtag.py#L79 https://github.com/rahulpawargithub/Instagram-Hashtag-Finder/blob/master/src/hashtag.py#L19 https://github.com/rahulpawargithub/Instagram-Hashtag-Finder/blob/master/src/browsercontroller.py#L34
The exception is thrown in get_element_text_by_xpath() if the web-page elements of related hashtag are not found. This situation is harmless on overall functionality of tool. So simply "EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing" is printed on the console and tool continues. The process is repeated "related_tags_limit" times which is set to 10 in sample.ini file. This is expected result.
Part 2: Login feature: I still think this is not necessary for scraping hashtags. The tool still can find and report the number of posts for seed hashtag without logging-in. I tried it in private/incognito mode. https://www.instagram.com/explore/tags/
Can you try some experiments and share the results with me. If it doesn't work without login then I can add login feature or perhaps you help adding that feature.
Hi,
It required login I add the part code for login it searches for #life or #Health and finds shows result but it print WARNING : Could not find any post for tag #life EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing