rsa408 commented 3 years ago

Hi,

It required login I add the part code for login it searches for #life or #Health and finds shows result but it print WARNING : Could not find any post for tag #life EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing

rp-code9 commented 3 years ago

@rsa408 Thanks for reporting the issue.

You don't need to add logic for login. Instagram's hashtag search is publicly accessible over this link https://www.instagram.com/explore/tags/.

I added #life and #Health in sample_config.ini and the app was able to find the hashtag volumns properly. Here is the output.

INFO : Browser opened in constructor INFO : Browser opened EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing INFO : Collected: #life EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing INFO : Collected: #Health INFO : Browser closed INFO : Printing collected hashtags and volume

life 359990111

 #Health                  128351818

Can you try once again?

rsa408 commented 3 years ago

I get similar result but loging in.

code works for me : Browsercontroller

from selenium import webdriver
from time import sleep
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

class BrowserController:
    def __init__(self, driverpath):
        # self.browser = None
        # self.wait = None
        self.driver_path = driverpath
        self.browser = webdriver.Chrome(executable_path=self.driver_path)
        self.wait = WebDriverWait(self.browser, 3)
        print("INFO : Browser opened in constructor")

    def browser_open(self,url,username,password):
        """
        TODO : Fill docstrings
        """
        self.browser.get(url)

        username_input = self.browser.find_element_by_name("username")
        password_input = self.browser.find_element_by_name("password")
        username_input.send_keys(username)
        password_input.send_keys(password)
        self.browser.implicitly_wait(5)
        login_button = self.browser.find_element_by_xpath('//button[@type="submit"]')
        login_button.click()
        sleep(2)
        tempy = self.browser.find_element_by_xpath("//button[contains(text(), 'Not Now')]")
        tempy.click()
        sleep(2)
        # Uncomment the lines below if a second pop-up appears
        tempy2 = self.browser.find_element_by_xpath("//button[contains(text(), 'Not Now')]")
        tempy2.click()
        sleep(2)
        # self.wait = WebDriverWait(self.browser, 3)
        print("INFO : Browser opened")

    def browser_close(self):
        """
        TODO : Fill docstrings
        """
        self.browser.close()
        print("INFO : Browser closed")

    def load_and_get(self, url):
        self.browser.get(url)

    def get_element_text_by_xpath(self, xpath):
        try:
            text = self.wait.until(EC.presence_of_element_located((By.XPATH, xpath))).text.strip()
            return text
        except Exception as ex:
            print("EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing")
        return ""

Main

from time import sleep
from configparser import ConfigParser
from hashtag import Hashtag
from browsercontroller import BrowserController
from selenium import webdriver

username = ""
password = ""

def add_hash_symbol(tag_list):
    for i in range(len(tag_list)):
        tag = tag_list[i]
        if tag.find('#') == -1:
            tag = "#" + tag
        tag_list[i] = tag

def print_sorted_database(database):
    """
        TODO : Fill docstrings
        """
    print("INFO : Printing collected hashtags and volume")
    for key, value in sorted(database.items(), key=lambda item: item[1], reverse=True):
        print('\t {:20s} \t {:>10d} '.format(key, value))

def main():
    # Read config file
    parser = ConfigParser()
    parser.read('../config.ini')

    sections = ["DEFAULT", "SEEDS"]
    driver_path = parser.get(sections[0], "driverpath")
    baseurl = parser.get(sections[0], "BaseUrl")
    num_tags = parser.get(sections[0], "numtags")
    related_tag_limit = parser.get(sections[0], "RelatedTagLimits")
    num_seeds = parser.get(sections[0], "numseeds")

    # get seed hash tags
    seeds = []
    for i in range(int(num_seeds)):
        seeds.append(parser.get(sections[1], "seed{:d}".format(i + 1)))
    add_hash_symbol(seeds)

    # browser object and hashtag objects
    browser = BrowserController(driver_path)
    browser.browser_open('https://www.instagram.com', username,password)

    sleep(2)

    database = dict()
    for seed in seeds:
        hashtags = Hashtag(seed, baseurl, int(num_tags), int(related_tag_limit))
        database.update(hashtags.scrapping_loop(browser))
    #browser.browser_close()

    # Print collected hashtags
    print_sorted_database(database)

if __name__ == "__main__":
    main()

Result

INFO : Browser opened in constructor
INFO : Browser opened
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
INFO : Collected: #health
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing
INFO : Collected: #life
INFO : Printing collected hashtags and volume
     #life                    360092358 
     #health                  128383797 

Process finished with exit code 0

rp-code9 commented 3 years ago

Part 1 : Related hashtag and exception

The related hashtag logic is broken because Intagram has disabled the feature. Check this news https://www.theverge.com/2020/8/5/21355976/instagram-related-hashtags-disabled-feature-bug-trump-biden

You can read the logic of scraping the related hashtag in following lines of code : https://github.com/rahulpawargithub/Instagram-Hashtag-Finder/blob/master/src/hashtag.py#L79 https://github.com/rahulpawargithub/Instagram-Hashtag-Finder/blob/master/src/hashtag.py#L19 https://github.com/rahulpawargithub/Instagram-Hashtag-Finder/blob/master/src/browsercontroller.py#L34

The exception is thrown in get_element_text_by_xpath() if the web-page elements of related hashtag are not found. This situation is harmless on overall functionality of tool. So simply "EXCEPTION: Something bad happened. Perhaps no more element or timeout. Continuing" is printed on the console and tool continues. The process is repeated "related_tags_limit" times which is set to 10 in sample.ini file. This is expected result.

Part 2: Login feature: I still think this is not necessary for scraping hashtags. The tool still can find and report the number of posts for seed hashtag without logging-in. I tried it in private/incognito mode. https://www.instagram.com/explore/tags/

Can you try some experiments and share the results with me. If it doesn't work without login then I can add login feature or perhaps you help adding that feature.

rp-code9 / Instagram-Hashtag-Finder

It needed log in, I added but nothing finding. #1

life 359990111