tzuhsial / InstagramCrawler

A non API python program to crawl public photos, posts or followers
https://github.com/iammrhelo/InstagramCrawler
MIT License
373 stars 108 forks source link

Error when scraping captions #8

Open joaanna opened 7 years ago

joaanna commented 7 years ago

Hey, so far I crawled followers smoothly, but I have 2 issues:

  1. I get this when I try to crawl the captions python instagramcrawler.py -d data -q 'viralnova365' -c -n 10 dir_prefix: data, query: viralnova365, crawl_type: photos, number: 10, caption: True posts: 1660, number: 10 Scraping photo links... Number of photo_links: 25 Scraping captions... Traceback (most recent call last): File "instagramcrawler.py", line 297, in main() File "instagramcrawler.py", line 293, in main caption=args.caption) File "instagramcrawler.py", line 85, in crawl self.click_and_scrape_captions(number) File "instagramcrawler.py", line 161, in click_and_scrape_captions FIREFOX_FIRST_POST_PATH).click() File "/InstagramCrawler/crawl/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 313, in find_element_by_xpath return self.find_element(by=By.XPATH, value=xpath) File "InstagramCrawler/crawl/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 791, in find_element 'value': value})['value'] File 'InstagramCrawler/crawl/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 256, in execute self.error_handler.check_response(response) File "InstagramCrawler/crawl/lib/python3.4/site-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: //a[contains(@class, '_8mlbc _vbtk2 _t5r8b')]
  2. also I would like to crawl all the images, but it never downloades the number specifed by -n, do you have any suggestions?
tzuhsial commented 7 years ago

Hi @joaanna , Thank you for telling me! I'll look into this when I have time...

tzuhsial commented 7 years ago

@joaanna I think I fixed the path to caption, that makes captions crawlable now. (Guess I'll have to do this everytime whenever Instagram updates)

And about the number issue, I am still looking for a robust way to detect if new posts are loaded. Any help is appreciated!

anfiallos commented 6 years ago

Hi. I have the same problem. Error with values on label. FIREFOX_FIRST_POST_PATH Any suggestion please?

anakmalank commented 5 years ago

hi, i got this problem too. selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: //div[contains(@class, '_8mlbc _vbtk2 _t5r8b')] image