hellock / icrawler

A multi-thread crawler framework with many builtin image crawlers provided.
http://icrawler.readthedocs.io/en/latest/
MIT License
848 stars 174 forks source link

Tkinter UI for icrawler #123

Open Patty-OFurniture opened 8 months ago

Patty-OFurniture commented 8 months ago

I have a simple UI in Tkinter, which fixes several issues, WITHOUT changing the core library. If you are interested, it does show some interesting things you can do with icrawler. Yes it might seem like a mess, but if you are already using icrawler it should be clear. I can write python, and I am learning tkinter, but suggestions are welcome on my Issues list. Most things work and I want to add more.

I forked the whole project in case I needed to do fixes, but the UI is all in /examples/

https://github.com/Patty-OFurniture/icrawler

98 - keep_file() override in FilenameDownloader checks file type, you can return False if extension != "jpg"

111 - example how to override set_logger() for full control (commented out for me)

108 - get file name (from Content-Disposition or URL)

108 - also log (INFO) image #, filename, URL. You can change the formatting, log to a file, or whatever else you want

117 and #107- log (DEBUG) the Google content if no images are found to help resolve, if it's still a problem

110 - a similar log could be done for Bing. Not implemented, but easily copied (google.py)

106 - a keyword separator option, so you san enter, for example: "beans|rice" and search first "beans" then "rice", separately

103 - google language selection fix should help Baidu, since it adds headers to look more like a web browser and avoid getting flagged.

104 - google language selection should help. Common languages are in GoogleLanguageOptions.py, add to it if you need to

61 - sort of fixed, it creates a directory for each keyword. "rice" goes in storage/rice/, "beans" in storage/beans/ - hopefully it is a good example.

121 - a better, but not perfect, check for disk space errors, in the core library

Also image type detection for #108, finding the correct file extension

Thanks to hellock for the library, I'm just making it easier for me to use!

Have fun! Patty

ZhiyuanChen commented 7 months ago

Thank you for your work! It looks excellent