A tool for downloading from public image boards (which allow scraping) / preview your images & tags / edit your tags. Additional tabs for downloading other desired code repositories as well as S.O.T.A. Diffusion and Clip models for your purposes. Custom datasets can be added!
Make sure you have git
installed!
Download either the windows, mac, or linux run file (repo will be installed for you):
Mac and Linux Users should make the file executable with the following terminal command:
chmod +x linux_run.sh
OR
chmod +x mac_run.sh
(Linux)
sudo apt-get install unzip
The "DUPLICATE" run files (run.bat, mac_run.sh, linux_run.sh) residing in the Data-Curation-Tool folder, are intentionally deleted when the program is run.
Double-Click file to run with (Default) settings
Update dependencies i.e. in the yaml file with the following (make sure to use the most recent yaml file in the repo: https://raw.githubusercontent.com/x-CK-x/Dataset-Curation-Tool/main/environment.yml):
./RUN_FILE --update
Run with sharing turned on : Provides a live link that anyone can use
./RUN_FILE --share
Run password protected : Requires user to type in a username & password to access the webUI
./RUN_FILE --server_port 7860 --username NAME --password PASS
Run on a specified PORT : Displays the webUI relative to a specified PORT
./RUN_FILE --server_port 7860
OR CHOOSE ANY COMBINATION OF ^
Create a Support Ticket or Bug Report here: https://github.com/x-CK-x/Dataset-Curation-Tool/issues
Feel free to suggest new feature/s here: https://github.com/x-CK-x/Dataset-Curation-Tool/discussions/categories/ideas
NEW Features Paused as of (09/05/2023) :: unless there are willing contributors to develop any of the other features.
New image board specific tagging/captioning models will be supported as they are released :: (There is "no" current eta. on the progress of those models being developed by others)
Contributors are welcome to open a Pull Request
for their developments & I will promptly review it to be added
base_folder/
├─ batch_folder/
│ ├─ downloaded_posts_folder/
│ │ ├─ png_folder/
│ │ ├─ jpg_folder/
│ │ ├─ gif_folder/
│ │ ├─ webm_folder/
│ │ ├─ swf_folder/
│ ├─ resized_img_folder/
│ ├─ tag_count_list_folder/
│ │ ├─ tags.csv
│ │ ├─ tag_category.csv
│ ├─ save_searched_list_path.txt
Any file path parameter that are empty will use the default path.
Files/folders that use the same path are merged, not overwritten. For example, using the same path for save_searched_list_path at every batch will result in a combined searched list of every batch in one .txt file.
delete_original
to true
if you plan redownloading using the same destination folder.resized_img_folder
uses a different folder from the source folder, if the file in the destination folder already exists, it is skipped. It does not check if the already existing file has the specified min_short_side
.MIT
By using this downloader, the user agrees that the author is not liable for any misuse of this downloader. This downloader is open-source and free to use.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.