mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
12.03k stars 978 forks source link

Document how to use as library #642

Open lyz-code opened 4 years ago

lyz-code commented 4 years ago

Hi, I intend to use gallery-dl as a library for a program to periodically fetch the selected sources.

I already do it with youtube-dl as it's documented in their docs.

Is there a simple way to do the following with gallery-dl?

ydl_opts = {
    'format': 'bestaudio/best',
    'postprocessors': [{
        'key': 'FFmpegExtractAudio',
        'preferredcodec': 'mp3',
        'preferredquality': '192',
    }],
    'logger': MyLogger(),
    'progress_hooks': [my_hook],
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
    ydl.download(['https://www.youtube.com/watch?v=BaW_jenozKc'])

I've seen in this issue that you can process one url:

>>> from gallery_dl import job
>>> j = job.DataJob("https://imgur.com/0gybAXR")
>>> j.run()
[ ... ]

But it doesn't downloads the file, nor it works with gallery links such as https://www.deviantart.com/{{ user }} Thank you

mikf commented 4 years ago

Use a DownloadJob instance to actually download stuff. A DataJob object will only collect the data returned by its Extractor and not do much else with it.

Setting config options should be done via the functions in config.py, like config.set(), or by directly manipulating the _config dict in there. You can load config files with config.load().

For example:

from gallery_dl import config, job

config.load()  # load default config files
config.set(("extractor",), "base-directory", "/tmp/")
config.set(("extractor", "imgur"), "filename", "{id}{title:?_//}.{extension}")

for url in urls:
    job.DownloadJob(url).run()
lyz-code commented 4 years ago

Thank you @mikf, it helped a lot.

For others reading this issue, to know which options you need to set use the two config examples (1 and 2) with the options description. Here are some options I've set:

config.set(('extractor',), "archive", '~/.gallery-dl/archive.sql')
config.set(('extractor',), "base-directory", '~/downloads')
config.set(('extractor', 'deviantart'), "image-range", '1-10')
config.set(('extractor', 'deviantart'), "flat", False)
config.set(('extractor', 'deviantart'), "metadata", True)
config.set(
    ('extractor',),
    'postprocessors',
    [
        {
            "name": "metadata",
            "mode": "json",
        }
    ]
)

I'm still unable to configure the output, what am I doing wrong?

config.set(('output',), 'mode', 'terminal')
config.set(
    ('output',),
    'log',
    {
        "level": "info",
        "format": {
            "debug": "\u001b[0;37m{name}: {message}\u001b[0m",
            "info": "\u001b[1;37m{name}: {message}\u001b[0m",
            "warning": "\u001b[1;33m{name}: {message}\u001b[0m",
            "error": "\u001b[1;31m{name}: {message}\u001b[0m"
        }
    },
)
config.set(
    ('output',),
    'logfile',
    {
        "path": "log.txt",
        "mode": "w",
        "level": "debug"
    },
)
config.set(
    ('output',),
    "unsupportedfile",
    {
        "path": "unsupported.txt",
        "mode": "a",
        "format": "{asctime} {message}",
        "format-date": "%Y-%m-%d-%H-%M-%S"
    },
)

It produces the following config._config

 'output': {'log': {'format': {'debug': '\x1b[0;37m{name}: {message}\x1b[0m',
                               'error': '\x1b[1;31m{name}: {message}\x1b[0m',
                               'info': '\x1b[1;37m{name}: {message}\x1b[0m',
                               'warning': '\x1b[1;33m{name}: {message}\x1b[0m'},
                    'level': 'info'},
            'logfile': {'level': 'debug', 'mode': 'w', 'path': 'log.txt'},
            'mode': 'auto',
            'unsupportedfile': {'format': '{asctime} {message}',
                                'format-date': '%Y-%m-%d-%H-%M-%S',
                                'mode': 'a',
                                'path': 'unsupported.txt'}}}

Which is similar to the config example, but neither unsupported.txt, nor log.txt are being created.

Thanks

mikf commented 4 years ago

All logging output is done via Python's logging module.

You can use that to configure and attach your own handlers to the root logger, or you call initialize_logging(), configure_logging(), and setup_logging_handler() from output.py after setting your output options.

Take a look at main() and search for output. to see how this is "normally" done.

For example

import logging
from gallery_dl import output

# initialze logging and setup logging handler to stderr
output.initialize_logging(logging.INFO)
# apply config options to stderr handler and create file handler
output.configure_logging(logging.INFO)
# create unsupported-file handler
output.setup_logging_handler("unsupportedfile", fmt="{message}")
rpdelaney commented 4 years ago

@mikf would you accept a PR documenting how to do this?

mikf commented 4 years ago

@rpdelaney Sure. I'd be happy about any sort of contribution, especially documentation. Let me know if you need anything or if I should explain how certain things (are supposed to) work.

opsoyo commented 3 years ago

All logging output is done via Python's logging module.

You can use that to configure and attach your own handlers to the root logger, or you call initialize_logging(), configure_logging(), and setup_logging_handler() from output.py after setting your output options.

Take a look at main() and search for output. to see how this is "normally" done.

For example

import logging
from gallery_dl import output

# initialze logging and setup logging handler to stderr
output.initialize_logging(logging.INFO)
# apply config options to stderr handler and create file handler
output.configure_logging(logging.INFO)
# create unsupported-file handler
output.setup_logging_handler("unsupportedfile", fmt="{message}")

😅 I tried understanding this without success. For a split async moment, I simply use StringIO for stdout and stderr to capture and match with RegEx. Thankfully this small Discord bot won't mind the hacky method.

rachmadaniHaryono commented 2 years ago

config example

{ "output": {
    "log": { "level": "debug" },
    "#": "write logging messages to a separate file",
    "logfile": { "path": "/home/user/log.log", "mode": "a", "level": "debug" },
    "#": "write unrecognized URLs to a separate file",
    "unsupportedfile": { "path": "/home/user/unsupported.log", "mode": "a" }
}}
import logging
from gallery_dl import config, output
from gallery_dl.exception import NoExtractorError
from gallery_dl.extractor.common import get_soup
from gallery_dl.job import DataJob
#  load config before setting up logging
config.load()
# initialze logging and setup logging handler to stderr
output.initialize_logging(logging.DEBUG)
# apply config options to stderr handler and create file handler
output.configure_logging(logging.DEBUG)
# create unsupported-file handler
output.setup_logging_handler("unsupportedfile", fmt="{message}")
url = 'https://www.reddit.com/r/Hololive/comments/rcqpgr/'
job = DataJob(url)
job.run()
# process `job.data`

if you want to supress job.run

import os
with open(os.devnull, "w") as f:
    job.file = f
    job.run()
53845714nF commented 2 years ago

Does anyone of you know how to output all urls, like with the -g flag, but as a Python list and not on the stdout? I know I need the job.UrlJob class. I also tried to monky patch some functions (run, dispatchand handle_url) but it didn't work.

Any ideas?

rachmadaniHaryono commented 2 years ago

have you tried last code?

after job.run you can get the output from job.data

mikf commented 2 years ago

@53845714nF just copy the job.UrlJob code, remove anything you don't need, and store any URLs in a list. You should end up with something like

class UrlJob(Job):

    def __init__(self, url, parent=None):
        Job.__init__(self, url, parent)
        self.urls = []

    def handle_url(self, url, _):
        self.urls.append(url)

Accessing URLs afterwards is then just

>>> j = UrlJob("imgur.com/asdqwe")
>>> j.run()
0
>>> j.urls
['https://i.imgur.com/asdqw.jpg']
53845714nF commented 2 years ago

@mikf Awesome, works for me. 😘 And thanks for the work is a great program.

Hint: For those who also need it the import must then look like this: from gallery_dl.job import Job

pink-red commented 2 years ago

I've managed to create a job which acts like a Python generator. Useful when you need to extract large amount of posts, especially from Pixiv or E-Hentai, because the job produces each posts right after it was extracted and groups images by post.

Essentially, you can just iterate over posts and their URLs.

Implementation

from itertools import groupby
from operator import itemgetter

from gallery_dl.extractor.message import Message
from gallery_dl.job import Job
from gallery_dl.util import build_duration_func
from gallery_dl.exception import StopExtraction

# https://stackoverflow.com/questions/12775449/group-an-iterable-by-a-predicate-in-python
def igroup(iterable, isstart):
    """
    Turn [header, data1, data2, header, data3, data4, data5, header, header, ...]
    into [
        (header, [data1, data2]),
        (header, [data3, data4, data5]),
        (header, []),
        (header, []),
        ...
    ]
    """
    def key(item, count=[False]):
        if isstart(item):
           count[0] = not count[0] # start new group
        return count[0]

    for xs in map(itemgetter(1), groupby(iterable, key)):
        header = next(xs)
        yield header, xs

class GeneratorJob(Job):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.dispatched = False

    def _run(self):
        extractor = self.extractor
        sleep = build_duration_func(extractor.config("sleep-extractor"))
        if sleep:
            extractor.sleep(sleep(), "extractor")

        try:
            for msg in extractor:
                self.dispatch(msg)
                if self.dispatched:
                    yield msg
                    self.dispatched = False
        except StopExtraction:
            pass

    def run(self):
        message_generator = self._run()
        for post_mes, url_mess in igroup(
            message_generator, lambda msg: msg[0] == Message.Directory
        ):
            post = post_mes[1]
            urls = map(lambda mes: (mes[1], mes[2]), url_mess)
            yield (post, urls)

    def handle_url(self, url, kwdict):
        self.dispatched = True

Example usage

for post_dict, image_infos in GeneratorJob("https://www.pixiv.net/en/users/3143520/illustrations").run():
    print(post_dict)
    # Note: you must completely consume image_infos each time.
    for image_url, image_dict in image_infos:
        print(image_url)
        print(image_dict)
    print()

The example URL is SFW.

Example output (first 3 posts)

{'id': 94767140, 'title': 'Envar', 'type': 'illust', 'caption': 'Its been a long and hard year but I am excited to share update on our studio and the work we have created&#44; feel free to view the video or our new website&#44; thank you! 💖<br /><br /><a href="https://youtu.be/Wp4SD0Yyfds" target=\'_blank\' rel=\'noopener noreferrer\'>https://youtu.be/Wp4SD0Yyfds</a><br /><br /><a href="https://www.envarstudio.com/" target=\'_blank\' rel=\'noopener noreferrer\'>https://www.envarstudio.com/</a>', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'sketch', 'art', 'Original', 'arcane', 'Leagueoflegends', 'jinx', 'VALORANT'], 'tools': ['Photoshop'], 'create_date': '2021-12-14T04:13:20+09:00', 'page_count': 6, 'width': 10000, 'height': 5625, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19786, 'total_bookmarks': 1456, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 10, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 0, 'date': datetime.datetime(2021, 12, 13, 19, 13, 20), 'rating': 'General', 'suffix': '', 'category': 'pixiv', 'subcategory': 'artworks'}
https://i.pximg.net/img-original/img/2021/12/14/04/13/20/94767140_p0.jpg
{'id': 94767140, 'title': 'Envar', 'type': 'illust', 'caption': 'Its been a long and hard year but I am excited to share update on our studio and the work we have created&#44; feel free to view the video or our new website&#44; thank you! 💖<br /><br /><a href="https://youtu.be/Wp4SD0Yyfds" target=\'_blank\' rel=\'noopener noreferrer\'>https://youtu.be/Wp4SD0Yyfds</a><br /><br /><a href="https://www.envarstudio.com/" target=\'_blank\' rel=\'noopener noreferrer\'>https://www.envarstudio.com/</a>', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'sketch', 'art', 'Original', 'arcane', 'Leagueoflegends', 'jinx', 'VALORANT'], 'tools': ['Photoshop'], 'create_date': '2021-12-14T04:13:20+09:00', 'page_count': 6, 'width': 10000, 'height': 5625, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19786, 'total_bookmarks': 1456, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 10, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 0, 'date': datetime.datetime(2021, 12, 13, 19, 13, 20), 'rating': 'General', 'suffix': '_p00', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '94767140_p0', 'extension': 'jpg'}
https://i.pximg.net/img-original/img/2021/12/14/04/13/20/94767140_p1.jpg
{'id': 94767140, 'title': 'Envar', 'type': 'illust', 'caption': 'Its been a long and hard year but I am excited to share update on our studio and the work we have created&#44; feel free to view the video or our new website&#44; thank you! 💖<br /><br /><a href="https://youtu.be/Wp4SD0Yyfds" target=\'_blank\' rel=\'noopener noreferrer\'>https://youtu.be/Wp4SD0Yyfds</a><br /><br /><a href="https://www.envarstudio.com/" target=\'_blank\' rel=\'noopener noreferrer\'>https://www.envarstudio.com/</a>', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'sketch', 'art', 'Original', 'arcane', 'Leagueoflegends', 'jinx', 'VALORANT'], 'tools': ['Photoshop'], 'create_date': '2021-12-14T04:13:20+09:00', 'page_count': 6, 'width': 10000, 'height': 5625, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19786, 'total_bookmarks': 1456, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 10, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 1, 'date': datetime.datetime(2021, 12, 13, 19, 13, 20), 'rating': 'General', 'suffix': '_p01', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '94767140_p1', 'extension': 'jpg'}
https://i.pximg.net/img-original/img/2021/12/14/04/13/20/94767140_p2.jpg
{'id': 94767140, 'title': 'Envar', 'type': 'illust', 'caption': 'Its been a long and hard year but I am excited to share update on our studio and the work we have created&#44; feel free to view the video or our new website&#44; thank you! 💖<br /><br /><a href="https://youtu.be/Wp4SD0Yyfds" target=\'_blank\' rel=\'noopener noreferrer\'>https://youtu.be/Wp4SD0Yyfds</a><br /><br /><a href="https://www.envarstudio.com/" target=\'_blank\' rel=\'noopener noreferrer\'>https://www.envarstudio.com/</a>', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'sketch', 'art', 'Original', 'arcane', 'Leagueoflegends', 'jinx', 'VALORANT'], 'tools': ['Photoshop'], 'create_date': '2021-12-14T04:13:20+09:00', 'page_count': 6, 'width': 10000, 'height': 5625, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19786, 'total_bookmarks': 1456, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 10, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 2, 'date': datetime.datetime(2021, 12, 13, 19, 13, 20), 'rating': 'General', 'suffix': '_p02', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '94767140_p2', 'extension': 'jpg'}
https://i.pximg.net/img-original/img/2021/12/14/04/13/20/94767140_p3.jpg
{'id': 94767140, 'title': 'Envar', 'type': 'illust', 'caption': 'Its been a long and hard year but I am excited to share update on our studio and the work we have created&#44; feel free to view the video or our new website&#44; thank you! 💖<br /><br /><a href="https://youtu.be/Wp4SD0Yyfds" target=\'_blank\' rel=\'noopener noreferrer\'>https://youtu.be/Wp4SD0Yyfds</a><br /><br /><a href="https://www.envarstudio.com/" target=\'_blank\' rel=\'noopener noreferrer\'>https://www.envarstudio.com/</a>', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'sketch', 'art', 'Original', 'arcane', 'Leagueoflegends', 'jinx', 'VALORANT'], 'tools': ['Photoshop'], 'create_date': '2021-12-14T04:13:20+09:00', 'page_count': 6, 'width': 10000, 'height': 5625, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19786, 'total_bookmarks': 1456, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 10, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 3, 'date': datetime.datetime(2021, 12, 13, 19, 13, 20), 'rating': 'General', 'suffix': '_p03', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '94767140_p3', 'extension': 'jpg'}
https://i.pximg.net/img-original/img/2021/12/14/04/13/20/94767140_p4.jpg
{'id': 94767140, 'title': 'Envar', 'type': 'illust', 'caption': 'Its been a long and hard year but I am excited to share update on our studio and the work we have created&#44; feel free to view the video or our new website&#44; thank you! 💖<br /><br /><a href="https://youtu.be/Wp4SD0Yyfds" target=\'_blank\' rel=\'noopener noreferrer\'>https://youtu.be/Wp4SD0Yyfds</a><br /><br /><a href="https://www.envarstudio.com/" target=\'_blank\' rel=\'noopener noreferrer\'>https://www.envarstudio.com/</a>', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'sketch', 'art', 'Original', 'arcane', 'Leagueoflegends', 'jinx', 'VALORANT'], 'tools': ['Photoshop'], 'create_date': '2021-12-14T04:13:20+09:00', 'page_count': 6, 'width': 10000, 'height': 5625, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19786, 'total_bookmarks': 1456, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 10, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 4, 'date': datetime.datetime(2021, 12, 13, 19, 13, 20), 'rating': 'General', 'suffix': '_p04', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '94767140_p4', 'extension': 'jpg'}
https://i.pximg.net/img-original/img/2021/12/14/04/13/20/94767140_p5.jpg
{'id': 94767140, 'title': 'Envar', 'type': 'illust', 'caption': 'Its been a long and hard year but I am excited to share update on our studio and the work we have created&#44; feel free to view the video or our new website&#44; thank you! 💖<br /><br /><a href="https://youtu.be/Wp4SD0Yyfds" target=\'_blank\' rel=\'noopener noreferrer\'>https://youtu.be/Wp4SD0Yyfds</a><br /><br /><a href="https://www.envarstudio.com/" target=\'_blank\' rel=\'noopener noreferrer\'>https://www.envarstudio.com/</a>', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'sketch', 'art', 'Original', 'arcane', 'Leagueoflegends', 'jinx', 'VALORANT'], 'tools': ['Photoshop'], 'create_date': '2021-12-14T04:13:20+09:00', 'page_count': 6, 'width': 10000, 'height': 5625, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19786, 'total_bookmarks': 1456, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 10, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 5, 'date': datetime.datetime(2021, 12, 13, 19, 13, 20), 'rating': 'General', 'suffix': '_p05', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '94767140_p5', 'extension': 'jpg'}

{'id': 93521069, 'title': 'Summer breeze', 'type': 'illust', 'caption': 'Just painting the last moment of summer&#44; really enjoyed this one!', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['風景', '背景', 'art', 'summer', 'girl', 'sweden', 'オリジナル1000users入り'], 'tools': ['Photoshop'], 'create_date': '2021-10-18T02:09:51+09:00', 'page_count': 1, 'width': 2000, 'height': 1668, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19683, 'total_bookmarks': 1778, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 9, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 0, 'date': datetime.datetime(2021, 10, 17, 17, 9, 51), 'rating': 'General', 'suffix': '', 'category': 'pixiv', 'subcategory': 'artworks'}
https://i.pximg.net/img-original/img/2021/10/18/02/09/51/93521069_p0.jpg
{'id': 93521069, 'title': 'Summer breeze', 'type': 'illust', 'caption': 'Just painting the last moment of summer&#44; really enjoyed this one!', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['風景', '背景', 'art', 'summer', 'girl', 'sweden', 'オリジナル1000users入り'], 'tools': ['Photoshop'], 'create_date': '2021-10-18T02:09:51+09:00', 'page_count': 1, 'width': 2000, 'height': 1668, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19683, 'total_bookmarks': 1778, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 9, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 0, 'date': datetime.datetime(2021, 10, 17, 17, 9, 51), 'rating': 'General', 'suffix': '', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '93521069_p0', 'extension': 'jpg'}

{'id': 91114071, 'title': 'Ruined King promo illustration', 'type': 'illust', 'caption': 'Super excited to share the promo illustration we created for the Ruined king event in League of Legends&#44; A big thanks to the publishing team on League&#44; especially Moe&#44; Anton&#44; Craig and Ellen for their feedback and support during the process! And a huge shoutout to our incredible team that pulled this one through. I am so incredibly proud to work with such talented and incredible people and artists.', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'leagueoflegends', 'ruinedking'], 'tools': ['Photoshop'], 'create_date': '2021-07-09T06:49:38+09:00', 'page_count': 3, 'width': 3000, 'height': 1545, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 20031, 'total_bookmarks': 1300, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 29, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 0, 'date': datetime.datetime(2021, 7, 8, 21, 49, 38), 'rating': 'General', 'suffix': '', 'category': 'pixiv', 'subcategory': 'artworks'}
https://i.pximg.net/img-original/img/2021/07/09/06/49/38/91114071_p0.jpg
{'id': 91114071, 'title': 'Ruined King promo illustration', 'type': 'illust', 'caption': 'Super excited to share the promo illustration we created for the Ruined king event in League of Legends&#44; A big thanks to the publishing team on League&#44; especially Moe&#44; Anton&#44; Craig and Ellen for their feedback and support during the process! And a huge shoutout to our incredible team that pulled this one through. I am so incredibly proud to work with such talented and incredible people and artists.', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'leagueoflegends', 'ruinedking'], 'tools': ['Photoshop'], 'create_date': '2021-07-09T06:49:38+09:00', 'page_count': 3, 'width': 3000, 'height': 1545, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 20031, 'total_bookmarks': 1300, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 29, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 0, 'date': datetime.datetime(2021, 7, 8, 21, 49, 38), 'rating': 'General', 'suffix': '_p00', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '91114071_p0', 'extension': 'jpg'}
https://i.pximg.net/img-original/img/2021/07/09/06/49/38/91114071_p1.jpg
{'id': 91114071, 'title': 'Ruined King promo illustration', 'type': 'illust', 'caption': 'Super excited to share the promo illustration we created for the Ruined king event in League of Legends&#44; A big thanks to the publishing team on League&#44; especially Moe&#44; Anton&#44; Craig and Ellen for their feedback and support during the process! And a huge shoutout to our incredible team that pulled this one through. I am so incredibly proud to work with such talented and incredible people and artists.', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'leagueoflegends', 'ruinedking'], 'tools': ['Photoshop'], 'create_date': '2021-07-09T06:49:38+09:00', 'page_count': 3, 'width': 3000, 'height': 1545, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 20031, 'total_bookmarks': 1300, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 29, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 1, 'date': datetime.datetime(2021, 7, 8, 21, 49, 38), 'rating': 'General', 'suffix': '_p01', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '91114071_p1', 'extension': 'jpg'}
https://i.pximg.net/img-original/img/2021/07/09/06/49/38/91114071_p2.jpg
{'id': 91114071, 'title': 'Ruined King promo illustration', 'type': 'illust', 'caption': 'Super excited to share the promo illustration we created for the Ruined king event in League of Legends&#44; A big thanks to the publishing team on League&#44; especially Moe&#44; Anton&#44; Craig and Ellen for their feedback and support during the process! And a huge shoutout to our incredible team that pulled this one through. I am so incredibly proud to work with such talented and incredible people and artists.', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'leagueoflegends', 'ruinedking'], 'tools': ['Photoshop'], 'create_date': '2021-07-09T06:49:38+09:00', 'page_count': 3, 'width': 3000, 'height': 1545, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 20031, 'total_bookmarks': 1300, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 29, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 2, 'date': datetime.datetime(2021, 7, 8, 21, 49, 38), 'rating': 'General', 'suffix': '_p02', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '91114071_p2', 'extension': 'jpg'}
ZizzyDizzyMC commented 1 year ago

I've done a lot of playing with gallery_dl for the past few days but I have hit a snag.

I'm trying to set up a custom function to run if there's an error / log message I don't like.

yt_dlp lets me do this by adding a custom logger, of which I've attempted a couple dozen ways of attempting to read the log output from gallery_dl.

I don't want to write this to a file, making the current output options irrelevant.

If anyone has experience with this, I'd love to see an example on how to do this properly. I appreciate it.

I'll post my code I have later but I realized it's quite impolite to ask for help without contributing something. I've wrote a series of config changes for config setting at the beginning of my code.


gallery_dl.config.load()

# Set global config settings for GalleryDL Temporarily
gallery_dl.config.set(('extractor',), "archive", '/imgur/archive/imgur.sql')
gallery_dl.config.set(('extractor',), "base-directory", '/imgur/archive')
gallery_dl.config.set(('extractor',), "sleep", 1 )
gallery_dl.config.set(('extractor',), "http-timeout", 5 )

# Set Direct link extractor settings
gallery_dl.config.set(('extractor', 'directlink'), "archive", '/imgur/archive/imgur.sql')
gallery_dl.config.set(('extractor', 'directlink'), "archive-prefix", 'imgur.com, ')
gallery_dl.config.set(('extractor', 'directlink'), "archive-format", 'image, {filename}')
gallery_dl.config.set(('extractor', 'directlink'), "archive-pragma",  { "journal_mode=WAL", "synchronous=NORMAL" } )
gallery_dl.config.set(('extractor', 'directlink'), "base-directory", '/imgur/archive/')
gallery_dl.config.set(('extractor', 'directlink'), "directory", { "image" } )
gallery_dl.config.set(('extractor', 'directlink'), "filename", '{filename}.{extension!l}')
gallery_dl.config.set(('extractor', 'directlink'), "sleep", 1 )

# Set Imgur Extractor Settings

gallery_dl.config.set(('extractor', 'imgur'), "archive", '/imgur/archive/imgur.sql')
gallery_dl.config.set(('extractor', 'imgur'), "archive-prefix", 'imgur.com, ')
gallery_dl.config.set(('extractor', 'imgur'), "archive-format", '{subcategory}, {id}')
gallery_dl.config.set(('extractor', 'imgur'), "archive-pragma", { "journal_mode=WAL", "synchronous=NORMAL" } )
gallery_dl.config.set(('extractor', 'imgur'), "base-directory", '/imgur/archive/')
gallery_dl.config.set(('extractor', 'imgur'), "filename", '{id|filename}.{extension!l}')

# Set other Imgur Extractor Settings
gallery_dl.config.set(('extractor', 'imgur'), "image", { "directory" : [ "image" ] })
gallery_dl.config.set(('extractor', 'imgur'), "album", { "directory" : [ "album", "{album['id']}" ] })
gallery_dl.config.set(('extractor', 'imgur'), "favorite", { "directory" : [ "favorite" ] })
gallery_dl.config.set(('extractor', 'imgur'), "gallery", { "directory" : [ "gallery" ] })
gallery_dl.config.set(('extractor', 'imgur'), "search", { "directory" : [ "search" ] })
gallery_dl.config.set(('extractor', 'imgur'), "subreddit", { "directory" : [ "subreddit" ] })
gallery_dl.config.set(('extractor', 'imgur'), "tag", { "directory" : [ "tag" ] })
gallery_dl.config.set(('extractor', 'imgur'), "user", { "directory" : [ "user" ] })

# Set Downlader Extractor Settings

gallery_dl.config.set(('downloader',), 'mtime', True)

# Set postprocessor settings globally
gallery_dl.config.set(('extractor',),
        'postprocessors',
        [
            {
                "name": "metadata",
                "mode": "json",
                "extension": "json",
                "extension-format": "{extension!l}.json",
                "event": "file",
                "mtime": True
            }
        ])
ZizzyDizzyMC commented 1 year ago

I've kept hammering at it, this is going from a short snippet on stack overflow that does in fact work in it's entirety with a logger of 'test' and logging.debug("text")

I was looking at output.py and realized gallery-dl creates a basic logger object of gallery-dl

I may be mistaken in how that actually returns or creates this, in that case I apologize.

import logging
import gallery_dl

class MyLogger(logging.Handler):

    #def handle(*args):
    def emit(*args):
        print('Custom Handler')
        for item in args:
            print(item)

gallery_dl.output.initialize_logging(logging.INFO)
gallery_dl.output.configure_logging(logging.INFO)
logging.getLogger('gallery-dl').addHandler(MyLogger(logging.INFO))
gallery_dl.job.DownloadJob("imgur.com/8372ne").run()

I've included a junk command at the end that'll fail as an example of something I want to run a custom handler on for behavior within my embedded library use.

ZizzyDizzyMC commented 1 year ago

I've tried grabbing stderr with StringIO to no success.

I've came up with this after digging around the code.

class MyLogger(logging.Handler):

    #def handle(*args):
    def emit(*args, **kwags):
        print('--- args ---')
        for item in args:
            print(item)
        print('---')

# --- main ---

logger = logging.getLogger('imgur')
logger.setLevel(logging.ERROR) 
my_logger = MyLogger(logging.ERROR)
logger.addHandler(my_logger)
gallery_dl.job.DownloadJob("imgur.com/8372ne").run()

However the results I get vs what's printed to stdout/err is completely useless.

Results

--- args ---
<MyLogger (error)>
<LogRecord: imgur, 40, /home/ubuntu/bot-venv/lib/python3.10/site-packages/gallery_dl/job.py, 105, "%s: %s">
---
[imgur][error] HttpError: '404 Not Found' for 'https://api.imgur.com/post/v1/media/8372n'
4

There's no info as to the type of error just that there's an error.

chapmanjacobd commented 1 year ago

@pink-red's code wasn't working for me (I only tried version 1.25.5 - Git HEAD: 915d868)

Modifying it like this seems to be working for my purposes:

from gallery_dl.extractor.message import Message
from gallery_dl.job import Job
from gallery_dl.util import build_duration_func
from gallery_dl.exception import StopExtraction

class GeneratorJob(Job):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.dispatched = False
        self.visited = set()
        self.status = 0

    def message_generator(self):
        extractor = self.extractor
        sleep = build_duration_func(extractor.config("sleep-extractor"))
        if sleep:
            extractor.sleep(sleep(), "extractor")

        try:
            for msg in extractor:
                self.dispatch(msg)
                if self.dispatched:
                    yield msg
                    self.dispatched = False
        except StopExtraction:
            pass

    def run(self):
        for msg in self.message_generator():
            ident, url, kwdict = msg
            if ident == Message.Url:
                yield (msg[1], msg[2])

            elif ident == Message.Queue:
                if url in self.visited:
                    continue
                self.visited.add(url)

                cls = kwdict.get("_extractor")
                if cls:
                    extr = cls.from_url(url)
                else:
                    extr = self.extractor.find(url)

                if extr:
                    job = self.__class__(extr, self)
                    for webpath, info in job.run():
                        yield (webpath, info)
            else:
                raise TypeError

    def handle_url(self, url, kwdict):
        self.dispatched = True

    def handle_queue(self, url, kwdict):
        self.dispatched = True
regunakyle commented 1 year ago

Is it just me or is the library not usable in Jupyter Notebook?

Importing gallery_dl in a notebook gives the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 import gallery_dl

File ~/Downloads/New Folder/.venv/lib/python3.11/site-packages/gallery_dl/__init__.py:11
      9 import sys
     10 import logging
---> 11 from . import version, config, option, output, extractor, job, util, exception
     13 __author__ = "Mike Fährmann"
     14 __copyright__ = "Copyright 2014-2023 Mike Fährmann"

File ~/Downloads/New Folder/.venv/lib/python3.11/site-packages/gallery_dl/option.py:14
     12 import logging
     13 import sys
---> 14 from . import job, util, version
     17 class ConfigAction(argparse.Action):
     18     """Set argparse results as config values"""

File ~/Downloads/New Folder/.venv/lib/python3.11/site-packages/gallery_dl/job.py:15
     13 import collections
     14 from . import extractor, downloader, postprocessor
---> 15 from . import config, text, util, path, formatter, output, exception, version
     16 from .extractor.message import Message
     17 from .output import stdout_write

File ~/Downloads/New Folder/.venv/lib/python3.11/site-packages/gallery_dl/output.py:257
    253     sys.stderr.write(s)
    254     sys.stderr.flush()
--> 257 if sys.stdout.line_buffering:
    258     def stdout_write(s):
    259         sys.stdout.write(s)

AttributeError: 'OutStream' object has no attribute 'line_buffering'

Importing in a normal .py file seems to work though.

(Note: I am using Python 3.11.6, ipykernel 6.26.0 and gallery-dl 1.26.1)

pafke2 commented 1 year ago

Im using gallery-dl for downloadng instagram reels. Thanks to @mikf i managed to get the list of links, but how could i also get the shortcode from each reel? Edit* I figured it out - I just pasted into pink-red's script

for post_dict, image_infos in GeneratorJob("https://www.instagram.com/user/reels/").run():
    for image_url, image_dict in image_infos:
        print(image_dict['shortcode'])

but for some reason it doesnt get the first reel's shortcode the first reel (it's url in fact not a shortcode ???) is stored in post_dict. Can anyone please help?

mikf commented 1 year ago

File shortcodes (shortcode) and post shortcodes (post_shortcode) aren't necessarily the same for multi-file posts.

pafke2 commented 1 year ago

File shortcodes (shortcode) and post shortcodes (post_shortcode) aren't necessarily the same for multi-file posts.

What do you mean? the code abode does get me all the necessary information except for the first reel

hyugasyaoran commented 1 year ago

I am using the script and am having a problem with checking if the file has been downloaded or not. Here is my code so far.

from gallery_dl import config, job

config.set((), 'base-directory', './')
config.set(('extractor', 'instagram'), 'filename', '{date:%Y_%m_%d_%H_%M_%S}_{post_id}_{num}.{extension}')
config.set(('extractor', 'instagram'), 'archive', '.archives/{category}.sqlte3')
config.set(('extractor', 'instagram'), 'archive-format', '{date:%Y_%m_%d_%H_%M_%S}_{post_id}_{num}')     
config.set(('extractor', 'instagram', 'posts'), 'directory', ['instagram', '{username}', 'Posts'])

job.DownloadJob('https://www.instagram.com/username').run()
JSouthGB commented 1 year ago

I am using the script and am having a problem with checking if the file has been downloaded or not. Here is my code so far.

from gallery_dl import config, job

config.set((), 'base-directory', './')
config.set(('extractor', 'instagram'), 'filename', '{date:%Y_%m_%d_%H_%M_%S}_{post_id}_{num}.{extension}')
config.set(('extractor', 'instagram'), 'archive', '.archives/{category}.sqlte3')
config.set(('extractor', 'instagram'), 'archive-format', '{date:%Y_%m_%d_%H_%M_%S}_{post_id}_{num}')     
config.set(('extractor', 'instagram', 'posts'), 'directory', ['instagram', '{username}', 'Posts'])

job.DownloadJob('https://www.instagram.com/username').run()

Did you check the directory you set? What are you expecting to happen?

If you're looking for logging output, check mikf's post and rachmadaniHaryono's post about configuring logging from further up if you haven't already..

hyugasyaoran commented 1 year ago

I am using the script and am having a problem with checking if the file has been downloaded or not. Here is my code so far.

from gallery_dl import config, job

config.set((), 'base-directory', './')
config.set(('extractor', 'instagram'), 'filename', '{date:%Y_%m_%d_%H_%M_%S}_{post_id}_{num}.{extension}')
config.set(('extractor', 'instagram'), 'archive', '.archives/{category}.sqlte3')
config.set(('extractor', 'instagram'), 'archive-format', '{date:%Y_%m_%d_%H_%M_%S}_{post_id}_{num}')     
config.set(('extractor', 'instagram', 'posts'), 'directory', ['instagram', '{username}', 'Posts'])

job.DownloadJob('https://www.instagram.com/username').run()

Did you check the directory you set? What are you expecting to happen?

If you're looking for logging output, check mikf's post and rachmadaniHaryono's post about configuring logging from further up if you haven't already..

I am expecting that in the next runs, it will not overwrite the files it has previously downloaded. As far as I understand, extractor.*.skip = true so it will skip existing files and extractor.*.archive to note which files have been downloaded.

I will refer to the source you suggested. Thank you for spending your time

Hrxn commented 1 year ago

I am expecting that in the next runs, it will not overwrite the files it has previously downloaded. As far as I understand, extractor.*.skip = true so it will skip existing files and extractor.*.archive to note which files have been downloaded.

I will refer to the source you suggested. Thank you for spending your time

Yes, your assumptions should be correct so far. "skip" is true by default, and this should not deviate when using DownloadJob, so I think it should work as expected?

hyugasyaoran commented 12 months ago

Yes, your assumptions should be correct so far. "skip" is true by default, and this should not deviate when using DownloadJob, so I think it should work as expected?

Yeah, it worked. I tried deleting all downloaded files and it wont download them again. but it printed the output as .\instagram\[username]\Posts\[date]_[post_id]_[num].jpg so I thought it was still redownloading old files. Silly me.

hyugasyaoran commented 12 months ago

I want to filter specific dates into my code, and I know there is way using command line with --filter I still haven't found a way to use it in python. Any help would be appreciated.

tmzg0000 commented 11 months ago

I also want to filter specific dates into my code, and I know there is way using command line with --filter I still haven't found a way to use it in python. Any help would be appreciated. thanks

tmzg0000 commented 11 months ago

I want to filter specific dates into my code, and I know there is way using command line with --filter  I still haven't found a way to use it in python. Any help would be appreciated.

ues this set tweet_id_old = 0123456789012345678 tweet_id_srt= "int(tweet_id) > "+str(tweet_id_old) config.set(("extractor",), "image-filter", tweet_id_srt)

53845714nF commented 10 months ago

Are only the images filtered here? Or does this also reduce the number of requests to Twitter, for example?

mikf commented 10 months ago

image-filter applies to all files, despite its name. All (rate-limited) API requests to get those files in the first place still happen before they can get filtered.

SpiffyChatterbox commented 3 months ago

Created draft of documentation in the Wiki: Embedding gallery‐dl from another Python script

SpiffyChatterbox commented 3 months ago

Hmm, looks like I moved it and the wiki doesn't update. New link: https://github.com/mikf/gallery-dl/wiki/Developer-Instructions#embedding

Oyami-Srk commented 3 months ago

Hmm, looks like I moved it and the wiki doesn't update. New link: https://github.com/mikf/gallery-dl/wiki/Developer-Instructions#embedding

Thanks for the document, but the first thing I saw in this doc was "youtube-dl". Kinda funny haha.

SpiffyChatterbox commented 2 months ago

Thanks for catching that; it's now been corrected. Let me know if you see anything else that can be improved!