Open lyz-code opened 4 years ago
Use a DownloadJob
instance to actually download stuff.
A DataJob
object will only collect the data returned by its Extractor and not do much else with it.
Setting config options should be done via the functions in config.py, like config.set()
, or by directly manipulating the _config
dict in there. You can load config files with config.load()
.
For example:
from gallery_dl import config, job
config.load() # load default config files
config.set(("extractor",), "base-directory", "/tmp/")
config.set(("extractor", "imgur"), "filename", "{id}{title:?_//}.{extension}")
for url in urls:
job.DownloadJob(url).run()
Thank you @mikf, it helped a lot.
For others reading this issue, to know which options you need to set use the two config examples (1 and 2) with the options description. Here are some options I've set:
config.set(('extractor',), "archive", '~/.gallery-dl/archive.sql')
config.set(('extractor',), "base-directory", '~/downloads')
config.set(('extractor', 'deviantart'), "image-range", '1-10')
config.set(('extractor', 'deviantart'), "flat", False)
config.set(('extractor', 'deviantart'), "metadata", True)
config.set(
('extractor',),
'postprocessors',
[
{
"name": "metadata",
"mode": "json",
}
]
)
I'm still unable to configure the output, what am I doing wrong?
config.set(('output',), 'mode', 'terminal')
config.set(
('output',),
'log',
{
"level": "info",
"format": {
"debug": "\u001b[0;37m{name}: {message}\u001b[0m",
"info": "\u001b[1;37m{name}: {message}\u001b[0m",
"warning": "\u001b[1;33m{name}: {message}\u001b[0m",
"error": "\u001b[1;31m{name}: {message}\u001b[0m"
}
},
)
config.set(
('output',),
'logfile',
{
"path": "log.txt",
"mode": "w",
"level": "debug"
},
)
config.set(
('output',),
"unsupportedfile",
{
"path": "unsupported.txt",
"mode": "a",
"format": "{asctime} {message}",
"format-date": "%Y-%m-%d-%H-%M-%S"
},
)
It produces the following config._config
'output': {'log': {'format': {'debug': '\x1b[0;37m{name}: {message}\x1b[0m',
'error': '\x1b[1;31m{name}: {message}\x1b[0m',
'info': '\x1b[1;37m{name}: {message}\x1b[0m',
'warning': '\x1b[1;33m{name}: {message}\x1b[0m'},
'level': 'info'},
'logfile': {'level': 'debug', 'mode': 'w', 'path': 'log.txt'},
'mode': 'auto',
'unsupportedfile': {'format': '{asctime} {message}',
'format-date': '%Y-%m-%d-%H-%M-%S',
'mode': 'a',
'path': 'unsupported.txt'}}}
Which is similar to the config example, but neither unsupported.txt
, nor log.txt
are being created.
Thanks
All logging output is done via Python's logging
module.
You can use that to configure and attach your own handlers to the root logger,
or you call initialize_logging()
, configure_logging()
, and setup_logging_handler()
from output.py after setting your output options.
Take a look at main()
and search for output.
to see how this is "normally" done.
For example
import logging
from gallery_dl import output
# initialze logging and setup logging handler to stderr
output.initialize_logging(logging.INFO)
# apply config options to stderr handler and create file handler
output.configure_logging(logging.INFO)
# create unsupported-file handler
output.setup_logging_handler("unsupportedfile", fmt="{message}")
@mikf would you accept a PR documenting how to do this?
@rpdelaney Sure. I'd be happy about any sort of contribution, especially documentation. Let me know if you need anything or if I should explain how certain things (are supposed to) work.
All logging output is done via Python's
logging
module.You can use that to configure and attach your own handlers to the root logger, or you call
initialize_logging()
,configure_logging()
, andsetup_logging_handler()
from output.py after setting your output options.Take a look at
main()
and search foroutput.
to see how this is "normally" done.For example
import logging from gallery_dl import output # initialze logging and setup logging handler to stderr output.initialize_logging(logging.INFO) # apply config options to stderr handler and create file handler output.configure_logging(logging.INFO) # create unsupported-file handler output.setup_logging_handler("unsupportedfile", fmt="{message}")
😅 I tried understanding this without success. For a split async moment, I simply use StringIO
for stdout and stderr to capture and match with RegEx. Thankfully this small Discord bot won't mind the hacky method.
config example
{ "output": {
"log": { "level": "debug" },
"#": "write logging messages to a separate file",
"logfile": { "path": "/home/user/log.log", "mode": "a", "level": "debug" },
"#": "write unrecognized URLs to a separate file",
"unsupportedfile": { "path": "/home/user/unsupported.log", "mode": "a" }
}}
import logging
from gallery_dl import config, output
from gallery_dl.exception import NoExtractorError
from gallery_dl.extractor.common import get_soup
from gallery_dl.job import DataJob
# load config before setting up logging
config.load()
# initialze logging and setup logging handler to stderr
output.initialize_logging(logging.DEBUG)
# apply config options to stderr handler and create file handler
output.configure_logging(logging.DEBUG)
# create unsupported-file handler
output.setup_logging_handler("unsupportedfile", fmt="{message}")
url = 'https://www.reddit.com/r/Hololive/comments/rcqpgr/'
job = DataJob(url)
job.run()
# process `job.data`
if you want to supress job.run
import os
with open(os.devnull, "w") as f:
job.file = f
job.run()
Does anyone of you know how to output all urls, like with the -g flag, but as a Python list and not on the stdout?
I know I need the job.UrlJob
class. I also tried to monky patch some functions (run
, dispatch
and handle_url
) but it didn't work.
Any ideas?
have you tried last code?
after job.run you can get the output from job.data
@53845714nF just copy the job.UrlJob
code, remove anything you don't need, and store any URLs in a list. You should end up with something like
class UrlJob(Job):
def __init__(self, url, parent=None):
Job.__init__(self, url, parent)
self.urls = []
def handle_url(self, url, _):
self.urls.append(url)
Accessing URLs afterwards is then just
>>> j = UrlJob("imgur.com/asdqwe")
>>> j.run()
0
>>> j.urls
['https://i.imgur.com/asdqw.jpg']
@mikf Awesome, works for me. 😘 And thanks for the work is a great program.
Hint:
For those who also need it the import must then look like this: from gallery_dl.job import Job
I've managed to create a job which acts like a Python generator. Useful when you need to extract large amount of posts, especially from Pixiv or E-Hentai, because the job produces each posts right after it was extracted and groups images by post.
Essentially, you can just iterate over posts and their URLs.
from itertools import groupby
from operator import itemgetter
from gallery_dl.extractor.message import Message
from gallery_dl.job import Job
from gallery_dl.util import build_duration_func
from gallery_dl.exception import StopExtraction
# https://stackoverflow.com/questions/12775449/group-an-iterable-by-a-predicate-in-python
def igroup(iterable, isstart):
"""
Turn [header, data1, data2, header, data3, data4, data5, header, header, ...]
into [
(header, [data1, data2]),
(header, [data3, data4, data5]),
(header, []),
(header, []),
...
]
"""
def key(item, count=[False]):
if isstart(item):
count[0] = not count[0] # start new group
return count[0]
for xs in map(itemgetter(1), groupby(iterable, key)):
header = next(xs)
yield header, xs
class GeneratorJob(Job):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.dispatched = False
def _run(self):
extractor = self.extractor
sleep = build_duration_func(extractor.config("sleep-extractor"))
if sleep:
extractor.sleep(sleep(), "extractor")
try:
for msg in extractor:
self.dispatch(msg)
if self.dispatched:
yield msg
self.dispatched = False
except StopExtraction:
pass
def run(self):
message_generator = self._run()
for post_mes, url_mess in igroup(
message_generator, lambda msg: msg[0] == Message.Directory
):
post = post_mes[1]
urls = map(lambda mes: (mes[1], mes[2]), url_mess)
yield (post, urls)
def handle_url(self, url, kwdict):
self.dispatched = True
for post_dict, image_infos in GeneratorJob("https://www.pixiv.net/en/users/3143520/illustrations").run():
print(post_dict)
# Note: you must completely consume image_infos each time.
for image_url, image_dict in image_infos:
print(image_url)
print(image_dict)
print()
The example URL is SFW.
{'id': 94767140, 'title': 'Envar', 'type': 'illust', 'caption': 'Its been a long and hard year but I am excited to share update on our studio and the work we have created, feel free to view the video or our new website, thank you! 💖<br /><br /><a href="https://youtu.be/Wp4SD0Yyfds" target=\'_blank\' rel=\'noopener noreferrer\'>https://youtu.be/Wp4SD0Yyfds</a><br /><br /><a href="https://www.envarstudio.com/" target=\'_blank\' rel=\'noopener noreferrer\'>https://www.envarstudio.com/</a>', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'sketch', 'art', 'Original', 'arcane', 'Leagueoflegends', 'jinx', 'VALORANT'], 'tools': ['Photoshop'], 'create_date': '2021-12-14T04:13:20+09:00', 'page_count': 6, 'width': 10000, 'height': 5625, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19786, 'total_bookmarks': 1456, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 10, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 0, 'date': datetime.datetime(2021, 12, 13, 19, 13, 20), 'rating': 'General', 'suffix': '', 'category': 'pixiv', 'subcategory': 'artworks'}
https://i.pximg.net/img-original/img/2021/12/14/04/13/20/94767140_p0.jpg
{'id': 94767140, 'title': 'Envar', 'type': 'illust', 'caption': 'Its been a long and hard year but I am excited to share update on our studio and the work we have created, feel free to view the video or our new website, thank you! 💖<br /><br /><a href="https://youtu.be/Wp4SD0Yyfds" target=\'_blank\' rel=\'noopener noreferrer\'>https://youtu.be/Wp4SD0Yyfds</a><br /><br /><a href="https://www.envarstudio.com/" target=\'_blank\' rel=\'noopener noreferrer\'>https://www.envarstudio.com/</a>', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'sketch', 'art', 'Original', 'arcane', 'Leagueoflegends', 'jinx', 'VALORANT'], 'tools': ['Photoshop'], 'create_date': '2021-12-14T04:13:20+09:00', 'page_count': 6, 'width': 10000, 'height': 5625, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19786, 'total_bookmarks': 1456, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 10, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 0, 'date': datetime.datetime(2021, 12, 13, 19, 13, 20), 'rating': 'General', 'suffix': '_p00', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '94767140_p0', 'extension': 'jpg'}
https://i.pximg.net/img-original/img/2021/12/14/04/13/20/94767140_p1.jpg
{'id': 94767140, 'title': 'Envar', 'type': 'illust', 'caption': 'Its been a long and hard year but I am excited to share update on our studio and the work we have created, feel free to view the video or our new website, thank you! 💖<br /><br /><a href="https://youtu.be/Wp4SD0Yyfds" target=\'_blank\' rel=\'noopener noreferrer\'>https://youtu.be/Wp4SD0Yyfds</a><br /><br /><a href="https://www.envarstudio.com/" target=\'_blank\' rel=\'noopener noreferrer\'>https://www.envarstudio.com/</a>', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'sketch', 'art', 'Original', 'arcane', 'Leagueoflegends', 'jinx', 'VALORANT'], 'tools': ['Photoshop'], 'create_date': '2021-12-14T04:13:20+09:00', 'page_count': 6, 'width': 10000, 'height': 5625, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19786, 'total_bookmarks': 1456, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 10, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 1, 'date': datetime.datetime(2021, 12, 13, 19, 13, 20), 'rating': 'General', 'suffix': '_p01', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '94767140_p1', 'extension': 'jpg'}
https://i.pximg.net/img-original/img/2021/12/14/04/13/20/94767140_p2.jpg
{'id': 94767140, 'title': 'Envar', 'type': 'illust', 'caption': 'Its been a long and hard year but I am excited to share update on our studio and the work we have created, feel free to view the video or our new website, thank you! 💖<br /><br /><a href="https://youtu.be/Wp4SD0Yyfds" target=\'_blank\' rel=\'noopener noreferrer\'>https://youtu.be/Wp4SD0Yyfds</a><br /><br /><a href="https://www.envarstudio.com/" target=\'_blank\' rel=\'noopener noreferrer\'>https://www.envarstudio.com/</a>', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'sketch', 'art', 'Original', 'arcane', 'Leagueoflegends', 'jinx', 'VALORANT'], 'tools': ['Photoshop'], 'create_date': '2021-12-14T04:13:20+09:00', 'page_count': 6, 'width': 10000, 'height': 5625, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19786, 'total_bookmarks': 1456, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 10, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 2, 'date': datetime.datetime(2021, 12, 13, 19, 13, 20), 'rating': 'General', 'suffix': '_p02', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '94767140_p2', 'extension': 'jpg'}
https://i.pximg.net/img-original/img/2021/12/14/04/13/20/94767140_p3.jpg
{'id': 94767140, 'title': 'Envar', 'type': 'illust', 'caption': 'Its been a long and hard year but I am excited to share update on our studio and the work we have created, feel free to view the video or our new website, thank you! 💖<br /><br /><a href="https://youtu.be/Wp4SD0Yyfds" target=\'_blank\' rel=\'noopener noreferrer\'>https://youtu.be/Wp4SD0Yyfds</a><br /><br /><a href="https://www.envarstudio.com/" target=\'_blank\' rel=\'noopener noreferrer\'>https://www.envarstudio.com/</a>', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'sketch', 'art', 'Original', 'arcane', 'Leagueoflegends', 'jinx', 'VALORANT'], 'tools': ['Photoshop'], 'create_date': '2021-12-14T04:13:20+09:00', 'page_count': 6, 'width': 10000, 'height': 5625, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19786, 'total_bookmarks': 1456, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 10, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 3, 'date': datetime.datetime(2021, 12, 13, 19, 13, 20), 'rating': 'General', 'suffix': '_p03', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '94767140_p3', 'extension': 'jpg'}
https://i.pximg.net/img-original/img/2021/12/14/04/13/20/94767140_p4.jpg
{'id': 94767140, 'title': 'Envar', 'type': 'illust', 'caption': 'Its been a long and hard year but I am excited to share update on our studio and the work we have created, feel free to view the video or our new website, thank you! 💖<br /><br /><a href="https://youtu.be/Wp4SD0Yyfds" target=\'_blank\' rel=\'noopener noreferrer\'>https://youtu.be/Wp4SD0Yyfds</a><br /><br /><a href="https://www.envarstudio.com/" target=\'_blank\' rel=\'noopener noreferrer\'>https://www.envarstudio.com/</a>', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'sketch', 'art', 'Original', 'arcane', 'Leagueoflegends', 'jinx', 'VALORANT'], 'tools': ['Photoshop'], 'create_date': '2021-12-14T04:13:20+09:00', 'page_count': 6, 'width': 10000, 'height': 5625, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19786, 'total_bookmarks': 1456, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 10, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 4, 'date': datetime.datetime(2021, 12, 13, 19, 13, 20), 'rating': 'General', 'suffix': '_p04', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '94767140_p4', 'extension': 'jpg'}
https://i.pximg.net/img-original/img/2021/12/14/04/13/20/94767140_p5.jpg
{'id': 94767140, 'title': 'Envar', 'type': 'illust', 'caption': 'Its been a long and hard year but I am excited to share update on our studio and the work we have created, feel free to view the video or our new website, thank you! 💖<br /><br /><a href="https://youtu.be/Wp4SD0Yyfds" target=\'_blank\' rel=\'noopener noreferrer\'>https://youtu.be/Wp4SD0Yyfds</a><br /><br /><a href="https://www.envarstudio.com/" target=\'_blank\' rel=\'noopener noreferrer\'>https://www.envarstudio.com/</a>', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'sketch', 'art', 'Original', 'arcane', 'Leagueoflegends', 'jinx', 'VALORANT'], 'tools': ['Photoshop'], 'create_date': '2021-12-14T04:13:20+09:00', 'page_count': 6, 'width': 10000, 'height': 5625, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19786, 'total_bookmarks': 1456, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 10, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 5, 'date': datetime.datetime(2021, 12, 13, 19, 13, 20), 'rating': 'General', 'suffix': '_p05', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '94767140_p5', 'extension': 'jpg'}
{'id': 93521069, 'title': 'Summer breeze', 'type': 'illust', 'caption': 'Just painting the last moment of summer, really enjoyed this one!', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['風景', '背景', 'art', 'summer', 'girl', 'sweden', 'オリジナル1000users入り'], 'tools': ['Photoshop'], 'create_date': '2021-10-18T02:09:51+09:00', 'page_count': 1, 'width': 2000, 'height': 1668, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19683, 'total_bookmarks': 1778, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 9, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 0, 'date': datetime.datetime(2021, 10, 17, 17, 9, 51), 'rating': 'General', 'suffix': '', 'category': 'pixiv', 'subcategory': 'artworks'}
https://i.pximg.net/img-original/img/2021/10/18/02/09/51/93521069_p0.jpg
{'id': 93521069, 'title': 'Summer breeze', 'type': 'illust', 'caption': 'Just painting the last moment of summer, really enjoyed this one!', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['風景', '背景', 'art', 'summer', 'girl', 'sweden', 'オリジナル1000users入り'], 'tools': ['Photoshop'], 'create_date': '2021-10-18T02:09:51+09:00', 'page_count': 1, 'width': 2000, 'height': 1668, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 19683, 'total_bookmarks': 1778, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 9, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 0, 'date': datetime.datetime(2021, 10, 17, 17, 9, 51), 'rating': 'General', 'suffix': '', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '93521069_p0', 'extension': 'jpg'}
{'id': 91114071, 'title': 'Ruined King promo illustration', 'type': 'illust', 'caption': 'Super excited to share the promo illustration we created for the Ruined king event in League of Legends, A big thanks to the publishing team on League, especially Moe, Anton, Craig and Ellen for their feedback and support during the process! And a huge shoutout to our incredible team that pulled this one through. I am so incredibly proud to work with such talented and incredible people and artists.', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'leagueoflegends', 'ruinedking'], 'tools': ['Photoshop'], 'create_date': '2021-07-09T06:49:38+09:00', 'page_count': 3, 'width': 3000, 'height': 1545, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 20031, 'total_bookmarks': 1300, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 29, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 0, 'date': datetime.datetime(2021, 7, 8, 21, 49, 38), 'rating': 'General', 'suffix': '', 'category': 'pixiv', 'subcategory': 'artworks'}
https://i.pximg.net/img-original/img/2021/07/09/06/49/38/91114071_p0.jpg
{'id': 91114071, 'title': 'Ruined King promo illustration', 'type': 'illust', 'caption': 'Super excited to share the promo illustration we created for the Ruined king event in League of Legends, A big thanks to the publishing team on League, especially Moe, Anton, Craig and Ellen for their feedback and support during the process! And a huge shoutout to our incredible team that pulled this one through. I am so incredibly proud to work with such talented and incredible people and artists.', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'leagueoflegends', 'ruinedking'], 'tools': ['Photoshop'], 'create_date': '2021-07-09T06:49:38+09:00', 'page_count': 3, 'width': 3000, 'height': 1545, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 20031, 'total_bookmarks': 1300, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 29, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 0, 'date': datetime.datetime(2021, 7, 8, 21, 49, 38), 'rating': 'General', 'suffix': '_p00', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '91114071_p0', 'extension': 'jpg'}
https://i.pximg.net/img-original/img/2021/07/09/06/49/38/91114071_p1.jpg
{'id': 91114071, 'title': 'Ruined King promo illustration', 'type': 'illust', 'caption': 'Super excited to share the promo illustration we created for the Ruined king event in League of Legends, A big thanks to the publishing team on League, especially Moe, Anton, Craig and Ellen for their feedback and support during the process! And a huge shoutout to our incredible team that pulled this one through. I am so incredibly proud to work with such talented and incredible people and artists.', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'leagueoflegends', 'ruinedking'], 'tools': ['Photoshop'], 'create_date': '2021-07-09T06:49:38+09:00', 'page_count': 3, 'width': 3000, 'height': 1545, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 20031, 'total_bookmarks': 1300, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 29, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 1, 'date': datetime.datetime(2021, 7, 8, 21, 49, 38), 'rating': 'General', 'suffix': '_p01', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '91114071_p1', 'extension': 'jpg'}
https://i.pximg.net/img-original/img/2021/07/09/06/49/38/91114071_p2.jpg
{'id': 91114071, 'title': 'Ruined King promo illustration', 'type': 'illust', 'caption': 'Super excited to share the promo illustration we created for the Ruined king event in League of Legends, A big thanks to the publishing team on League, especially Moe, Anton, Craig and Ellen for their feedback and support during the process! And a huge shoutout to our incredible team that pulled this one through. I am so incredibly proud to work with such talented and incredible people and artists.', 'restrict': 0, 'user': {'id': 3143520, 'name': 'snatti', 'account': 'snatti', 'profile_image_urls': {'medium': 'https://i.pximg.net/user-profile/img/2013/10/24/14/59/07/6974808_32cfae5c5558599ade59663e6b1452e6_170.jpg'}, 'is_followed': False, 'is_access_blocking_user': False}, 'tags': ['illustration', 'concept', 'painting', 'leagueoflegends', 'ruinedking'], 'tools': ['Photoshop'], 'create_date': '2021-07-09T06:49:38+09:00', 'page_count': 3, 'width': 3000, 'height': 1545, 'sanity_level': 2, 'x_restrict': 0, 'series': None, 'total_view': 20031, 'total_bookmarks': 1300, 'is_bookmarked': False, 'visible': True, 'is_muted': False, 'total_comments': 29, 'illust_ai_type': 0, 'illust_book_style': 0, 'num': 2, 'date': datetime.datetime(2021, 7, 8, 21, 49, 38), 'rating': 'General', 'suffix': '_p02', 'category': 'pixiv', 'subcategory': 'artworks', 'filename': '91114071_p2', 'extension': 'jpg'}
I've done a lot of playing with gallery_dl for the past few days but I have hit a snag.
I'm trying to set up a custom function to run if there's an error / log message I don't like.
yt_dlp lets me do this by adding a custom logger, of which I've attempted a couple dozen ways of attempting to read the log output from gallery_dl.
I don't want to write this to a file, making the current output options irrelevant.
If anyone has experience with this, I'd love to see an example on how to do this properly. I appreciate it.
I'll post my code I have later but I realized it's quite impolite to ask for help without contributing something. I've wrote a series of config changes for config setting at the beginning of my code.
gallery_dl.config.load()
# Set global config settings for GalleryDL Temporarily
gallery_dl.config.set(('extractor',), "archive", '/imgur/archive/imgur.sql')
gallery_dl.config.set(('extractor',), "base-directory", '/imgur/archive')
gallery_dl.config.set(('extractor',), "sleep", 1 )
gallery_dl.config.set(('extractor',), "http-timeout", 5 )
# Set Direct link extractor settings
gallery_dl.config.set(('extractor', 'directlink'), "archive", '/imgur/archive/imgur.sql')
gallery_dl.config.set(('extractor', 'directlink'), "archive-prefix", 'imgur.com, ')
gallery_dl.config.set(('extractor', 'directlink'), "archive-format", 'image, {filename}')
gallery_dl.config.set(('extractor', 'directlink'), "archive-pragma", { "journal_mode=WAL", "synchronous=NORMAL" } )
gallery_dl.config.set(('extractor', 'directlink'), "base-directory", '/imgur/archive/')
gallery_dl.config.set(('extractor', 'directlink'), "directory", { "image" } )
gallery_dl.config.set(('extractor', 'directlink'), "filename", '{filename}.{extension!l}')
gallery_dl.config.set(('extractor', 'directlink'), "sleep", 1 )
# Set Imgur Extractor Settings
gallery_dl.config.set(('extractor', 'imgur'), "archive", '/imgur/archive/imgur.sql')
gallery_dl.config.set(('extractor', 'imgur'), "archive-prefix", 'imgur.com, ')
gallery_dl.config.set(('extractor', 'imgur'), "archive-format", '{subcategory}, {id}')
gallery_dl.config.set(('extractor', 'imgur'), "archive-pragma", { "journal_mode=WAL", "synchronous=NORMAL" } )
gallery_dl.config.set(('extractor', 'imgur'), "base-directory", '/imgur/archive/')
gallery_dl.config.set(('extractor', 'imgur'), "filename", '{id|filename}.{extension!l}')
# Set other Imgur Extractor Settings
gallery_dl.config.set(('extractor', 'imgur'), "image", { "directory" : [ "image" ] })
gallery_dl.config.set(('extractor', 'imgur'), "album", { "directory" : [ "album", "{album['id']}" ] })
gallery_dl.config.set(('extractor', 'imgur'), "favorite", { "directory" : [ "favorite" ] })
gallery_dl.config.set(('extractor', 'imgur'), "gallery", { "directory" : [ "gallery" ] })
gallery_dl.config.set(('extractor', 'imgur'), "search", { "directory" : [ "search" ] })
gallery_dl.config.set(('extractor', 'imgur'), "subreddit", { "directory" : [ "subreddit" ] })
gallery_dl.config.set(('extractor', 'imgur'), "tag", { "directory" : [ "tag" ] })
gallery_dl.config.set(('extractor', 'imgur'), "user", { "directory" : [ "user" ] })
# Set Downlader Extractor Settings
gallery_dl.config.set(('downloader',), 'mtime', True)
# Set postprocessor settings globally
gallery_dl.config.set(('extractor',),
'postprocessors',
[
{
"name": "metadata",
"mode": "json",
"extension": "json",
"extension-format": "{extension!l}.json",
"event": "file",
"mtime": True
}
])
I've kept hammering at it, this is going from a short snippet on stack overflow that does in fact work in it's entirety with a logger of 'test'
and logging.debug("text")
I was looking at output.py and realized gallery-dl creates a basic logger object of gallery-dl
I may be mistaken in how that actually returns or creates this, in that case I apologize.
import logging
import gallery_dl
class MyLogger(logging.Handler):
#def handle(*args):
def emit(*args):
print('Custom Handler')
for item in args:
print(item)
gallery_dl.output.initialize_logging(logging.INFO)
gallery_dl.output.configure_logging(logging.INFO)
logging.getLogger('gallery-dl').addHandler(MyLogger(logging.INFO))
gallery_dl.job.DownloadJob("imgur.com/8372ne").run()
I've included a junk command at the end that'll fail as an example of something I want to run a custom handler on for behavior within my embedded library use.
I've tried grabbing stderr with StringIO to no success.
I've came up with this after digging around the code.
class MyLogger(logging.Handler):
#def handle(*args):
def emit(*args, **kwags):
print('--- args ---')
for item in args:
print(item)
print('---')
# --- main ---
logger = logging.getLogger('imgur')
logger.setLevel(logging.ERROR)
my_logger = MyLogger(logging.ERROR)
logger.addHandler(my_logger)
gallery_dl.job.DownloadJob("imgur.com/8372ne").run()
However the results I get vs what's printed to stdout/err is completely useless.
Results
--- args ---
<MyLogger (error)>
<LogRecord: imgur, 40, /home/ubuntu/bot-venv/lib/python3.10/site-packages/gallery_dl/job.py, 105, "%s: %s">
---
[imgur][error] HttpError: '404 Not Found' for 'https://api.imgur.com/post/v1/media/8372n'
4
There's no info as to the type of error just that there's an error.
@pink-red's code wasn't working for me (I only tried version 1.25.5 - Git HEAD: 915d868)
Modifying it like this seems to be working for my purposes:
from gallery_dl.extractor.message import Message
from gallery_dl.job import Job
from gallery_dl.util import build_duration_func
from gallery_dl.exception import StopExtraction
class GeneratorJob(Job):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.dispatched = False
self.visited = set()
self.status = 0
def message_generator(self):
extractor = self.extractor
sleep = build_duration_func(extractor.config("sleep-extractor"))
if sleep:
extractor.sleep(sleep(), "extractor")
try:
for msg in extractor:
self.dispatch(msg)
if self.dispatched:
yield msg
self.dispatched = False
except StopExtraction:
pass
def run(self):
for msg in self.message_generator():
ident, url, kwdict = msg
if ident == Message.Url:
yield (msg[1], msg[2])
elif ident == Message.Queue:
if url in self.visited:
continue
self.visited.add(url)
cls = kwdict.get("_extractor")
if cls:
extr = cls.from_url(url)
else:
extr = self.extractor.find(url)
if extr:
job = self.__class__(extr, self)
for webpath, info in job.run():
yield (webpath, info)
else:
raise TypeError
def handle_url(self, url, kwdict):
self.dispatched = True
def handle_queue(self, url, kwdict):
self.dispatched = True
Is it just me or is the library not usable in Jupyter Notebook?
Importing gallery_dl
in a notebook gives the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[2], line 1
----> 1 import gallery_dl
File ~/Downloads/New Folder/.venv/lib/python3.11/site-packages/gallery_dl/__init__.py:11
9 import sys
10 import logging
---> 11 from . import version, config, option, output, extractor, job, util, exception
13 __author__ = "Mike Fährmann"
14 __copyright__ = "Copyright 2014-2023 Mike Fährmann"
File ~/Downloads/New Folder/.venv/lib/python3.11/site-packages/gallery_dl/option.py:14
12 import logging
13 import sys
---> 14 from . import job, util, version
17 class ConfigAction(argparse.Action):
18 """Set argparse results as config values"""
File ~/Downloads/New Folder/.venv/lib/python3.11/site-packages/gallery_dl/job.py:15
13 import collections
14 from . import extractor, downloader, postprocessor
---> 15 from . import config, text, util, path, formatter, output, exception, version
16 from .extractor.message import Message
17 from .output import stdout_write
File ~/Downloads/New Folder/.venv/lib/python3.11/site-packages/gallery_dl/output.py:257
253 sys.stderr.write(s)
254 sys.stderr.flush()
--> 257 if sys.stdout.line_buffering:
258 def stdout_write(s):
259 sys.stdout.write(s)
AttributeError: 'OutStream' object has no attribute 'line_buffering'
Importing in a normal .py
file seems to work though.
(Note: I am using Python 3.11.6, ipykernel 6.26.0 and gallery-dl 1.26.1)
Im using gallery-dl for downloadng instagram reels. Thanks to @mikf i managed to get the list of links, but how could i also get the shortcode from each reel? Edit* I figured it out - I just pasted into pink-red's script
for post_dict, image_infos in GeneratorJob("https://www.instagram.com/user/reels/").run():
for image_url, image_dict in image_infos:
print(image_dict['shortcode'])
but for some reason it doesnt get the first reel's shortcode the first reel (it's url in fact not a shortcode ???) is stored in post_dict. Can anyone please help?
File shortcodes (shortcode
) and post shortcodes (post_shortcode
) aren't necessarily the same for multi-file posts.
File shortcodes (
shortcode
) and post shortcodes (post_shortcode
) aren't necessarily the same for multi-file posts.
What do you mean? the code abode does get me all the necessary information except for the first reel
I am using the script and am having a problem with checking if the file has been downloaded or not. Here is my code so far.
from gallery_dl import config, job
config.set((), 'base-directory', './')
config.set(('extractor', 'instagram'), 'filename', '{date:%Y_%m_%d_%H_%M_%S}_{post_id}_{num}.{extension}')
config.set(('extractor', 'instagram'), 'archive', '.archives/{category}.sqlte3')
config.set(('extractor', 'instagram'), 'archive-format', '{date:%Y_%m_%d_%H_%M_%S}_{post_id}_{num}')
config.set(('extractor', 'instagram', 'posts'), 'directory', ['instagram', '{username}', 'Posts'])
job.DownloadJob('https://www.instagram.com/username').run()
I am using the script and am having a problem with checking if the file has been downloaded or not. Here is my code so far.
from gallery_dl import config, job config.set((), 'base-directory', './') config.set(('extractor', 'instagram'), 'filename', '{date:%Y_%m_%d_%H_%M_%S}_{post_id}_{num}.{extension}') config.set(('extractor', 'instagram'), 'archive', '.archives/{category}.sqlte3') config.set(('extractor', 'instagram'), 'archive-format', '{date:%Y_%m_%d_%H_%M_%S}_{post_id}_{num}') config.set(('extractor', 'instagram', 'posts'), 'directory', ['instagram', '{username}', 'Posts']) job.DownloadJob('https://www.instagram.com/username').run()
Did you check the directory you set? What are you expecting to happen?
If you're looking for logging output, check mikf's post and rachmadaniHaryono's post about configuring logging from further up if you haven't already..
I am using the script and am having a problem with checking if the file has been downloaded or not. Here is my code so far.
from gallery_dl import config, job config.set((), 'base-directory', './') config.set(('extractor', 'instagram'), 'filename', '{date:%Y_%m_%d_%H_%M_%S}_{post_id}_{num}.{extension}') config.set(('extractor', 'instagram'), 'archive', '.archives/{category}.sqlte3') config.set(('extractor', 'instagram'), 'archive-format', '{date:%Y_%m_%d_%H_%M_%S}_{post_id}_{num}') config.set(('extractor', 'instagram', 'posts'), 'directory', ['instagram', '{username}', 'Posts']) job.DownloadJob('https://www.instagram.com/username').run()
Did you check the directory you set? What are you expecting to happen?
If you're looking for logging output, check mikf's post and rachmadaniHaryono's post about configuring logging from further up if you haven't already..
I am expecting that in the next runs, it will not overwrite the files it has previously downloaded. As far as I understand, extractor.*.skip = true so it will skip existing files and extractor.*.archive to note which files have been downloaded.
I will refer to the source you suggested. Thank you for spending your time
I am expecting that in the next runs, it will not overwrite the files it has previously downloaded. As far as I understand, extractor.*.skip = true so it will skip existing files and extractor.*.archive to note which files have been downloaded.
I will refer to the source you suggested. Thank you for spending your time
Yes, your assumptions should be correct so far.
"skip"
is true by default, and this should not deviate when using DownloadJob
, so I think it should work as expected?
Yes, your assumptions should be correct so far.
"skip"
is true by default, and this should not deviate when usingDownloadJob
, so I think it should work as expected?
Yeah, it worked. I tried deleting all downloaded files and it wont download them again. but it printed the output as .\instagram\[username]\Posts\[date]_[post_id]_[num].jpg
so I thought it was still redownloading old files. Silly me.
I want to filter specific dates into my code, and I know there is way using command line with --filter I still haven't found a way to use it in python. Any help would be appreciated.
I also want to filter specific dates into my code, and I know there is way using command line with --filter I still haven't found a way to use it in python. Any help would be appreciated. thanks
I want to filter specific dates into my code, and I know there is way using command line with --filter I still haven't found a way to use it in python. Any help would be appreciated.
ues this set tweet_id_old = 0123456789012345678 tweet_id_srt= "int(tweet_id) > "+str(tweet_id_old) config.set(("extractor",), "image-filter", tweet_id_srt)
Are only the images filtered here? Or does this also reduce the number of requests to Twitter, for example?
image-filter
applies to all files, despite its name.
All (rate-limited) API requests to get those files in the first place still happen before they can get filtered.
Created draft of documentation in the Wiki: Embedding gallery‐dl from another Python script
Hmm, looks like I moved it and the wiki doesn't update. New link: https://github.com/mikf/gallery-dl/wiki/Developer-Instructions#embedding
Hmm, looks like I moved it and the wiki doesn't update. New link: https://github.com/mikf/gallery-dl/wiki/Developer-Instructions#embedding
Thanks for the document, but the first thing I saw in this doc was "youtube-dl". Kinda funny haha.
Thanks for catching that; it's now been corrected. Let me know if you see anything else that can be improved!
Hi, I intend to use gallery-dl as a library for a program to periodically fetch the selected sources.
I already do it with youtube-dl as it's documented in their docs.
Is there a simple way to do the following with gallery-dl?
I've seen in this issue that you can process one url:
But it doesn't downloads the file, nor it works with gallery links such as https://www.deviantart.com/{{ user }} Thank you