meetmangukiya / instagram-scraper

Scrape the Instagram frontend. Inspired from twitter-scraper by @kennethreitz.
MIT License
937 stars 83 forks source link

unicode error fix #6

Closed ghost closed 6 years ago

meetmangukiya commented 6 years ago

@paul-15 did the bug occur locally for you too, and did this fix it? Or do you just think this fixes it? Because the exception seems to occur on join not in a read or write.

ghost commented 6 years ago

The bug is definitely fixed by this, I have tried it. I think the exception seems to occur on join because join is inside of writer.writerow().

If you do not use encoding='utf-8' and use join() outside of writer.writerow() you can see that the error occurs when writing to the file.

row =  [f'{count}.jpg',
    url,
    caption.replace('\n', '\\n'),
    ', '.join(hashtags),
    ', '.join(mentions)]
writer.writerow(row)
Traceback (most recent call last):
  File ".\instagram_scraper.py", line 112, in <module>
    main(args.tags, args.count, args.cont)
  File ".\instagram_scraper.py", line 96, in main
    _single_tag_processing(tag, total_count, existing_links, start)
  File ".\instagram_scraper.py", line 84, in _single_tag_processing
    writer.writerow(row)
  File "C:\Users\Paul\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 194-195: character maps to <undefined>
meetmangukiya commented 6 years ago

@paul-15 thanks for the info! This bug definitely looks like solved. Thanks for the contribution!