Feature Request: Embed cover art into concat audio file

leoncvlt / blinkist-scraper

📚 Python tool to download book summaries and audio from Blinkist.com, and generate some pretty output

191 stars 35 forks source link

Feature Request: Embed cover art into concat audio file #16

Closed itsbrex closed 4 years ago

itsbrex commented 4 years ago

First, thanks for this! This is great.

Wondering tho if it'd be possible to embed the album art into the .m4a file, or even (maybe both?) download the linked jpeg file in the HTML output file created into the containing folder?

rocketinventor commented 4 years ago

Hi @itsBREX!

I had in mind from before to add some code to add "Album" and "Genre" tags, so I could probably take care of this as well.

How does this look for you: Blinkist Cover: 1_1 aspect ratio @ 470px

This a 470² px image with a 1x1 aspect ratio. The full list of sizes is 130, 250, 470, 640, 1080, and 1400. This is the same image used by https://www.blinkist.com/en/nc/library and https://www.blinkist.com/en/books/[book-slug]

The one in the HTML file looks like this:

This is the 3x4 aspect ratio at 640px's.

iTunes uses 600 x 600 for its own files, but limits the display size for other files at 400 x 400, so somewhere around that size should be good. 3x4 aspect ratio is more common for MP3's, I think.

generally, the 'standard' for album art in iTunes is 600 x 600 pixels. The reason your scanned artwork doesn't look as good as downloaded artwork (in front row), is that iTunes downsamples all manually added artwork to 400x400 pixel images for use in coverflow, no matter what the original size of the picture was.

Reference: discussions.apple.com/thread/1433074

The image that is in the HTML file is stored on Blinkist's website - there should be an option to replace this with a local version that is downloaded (so it doesn't need internet).

I would propose the following command-line arguments:

--embed-coverart [--concat-audio only]
--coverart-size [130, 250, 470, 640, 1080, or 1400; default:470]
--coverart-aspect-ratio [1_1, 2-2_1, 3_4; default:(?)]
--keep-jpg-local [Same size and dimensions as the one linked to in the HTML file] (not sure what to call it)
--html-jpg-location [local, embed, and external; default:external]

@leoncvlt @itsBREX What do you think of these changes?

leoncvlt commented 4 years ago

Hey all, this a good feature: the reason it wasn't been implemented as of yet is that last time I checked, ffmpeg did not really play along well with embedding cover art in .m4v containers - all the solutions I found required external tools and I did not want to add any more dependencies to the script so I let it go. Worth doing some tests about whether it's possible with vanilla ffmpeg before putting too much time on this - I think most music players display the cover art anyway if they find a cover.jpg file in the same folder as the audio file?

Give that, I think we could just go with

--download-cover-art [if true, downloads the cover art file as cover.jpg in the output folder, 1_1 aspect ratio and 640px wide since that seems the most standard format / resolution for cover arts)]
--embed-cover-art [if true, embeds the file above in the audio file, then removes the file (unless download-cover-art is true). Requires --concat-audio]

What happens in the html file is mostly for styling. I think it grabs the default format and resolution from the book's .json file, which is 3_4 and 640px wide. If we want to add any options to customise aspect ratio and/or resolution (although I think that would make the command line arguments list too crowded) it should just impact this html image, as I can't imagine any scenario in which the user would want to download a non-square or lower-resolution cover art file.

Not linking to the image on the blinkist server is a good shout - maybe it should be like that by default. I imagine that any {image_url} instance in the html templates would then resolve to this local file instead of the link string. Or maybe, we could just inline the cover image as a base64 string in the html file directly?

rocketinventor commented 4 years ago

@leoncvlt I did some research on this topic, and it does seem that this function did not work with older versions of FFmpeg. However, it does work with newer versions. I tested it with version 4.0.2 of FFmpeg and I was able to embed the cover art without a problem. (See below)

Input:

ffmpeg -i input.m4a -i image.jpg -map 0 -map 1 -c copy -disposition:v:0 attached_pic output.m4a

(With image.jpg and input.m4a files in the same directory)

Output:

ffmpeg version 4.0.2 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 7.3.1 (GCC) 20180722
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-bzlib --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth
  libavutil      56. 14.100 / 56. 14.100
  libavcodec     58. 18.100 / 58. 18.100
  libavformat    58. 12.100 / 58. 12.100
  libavdevice    58.  3.100 / 58.  3.100
  libavfilter     7. 16.100 /  7. 16.100
  libswscale      5.  1.100 /  5.  1.100
  libswresample   3.  1.100 /  3.  1.100
  libpostproc    55.  1.100 / 55.  1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.m4a':
  Metadata:
    major_brand     : M4A
    minor_version   : 512
    compatible_brands: isomiso2
    title           : Twas The Nightshift Before Christmas
    artist          : Adam Kay
    encoder         : Lavf58.12.100
  Duration: 00:17:43.86, start: 0.000000, bitrate: 133 kb/s
    Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 131 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
Input #1, image2, from 'image.jpg':
  Duration: 00:00:00.04, start: 0.000000, bitrate: 2528 kb/s
    Stream #1:0: Video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 470x470 [SAR 1:1 DAR 1:1], 25 tbr, 25 tbn, 25 tbc
Output #0, ipod, to 'output.m4a':
  Metadata:
    major_brand     : M4A
    minor_version   : 512
    compatible_brands: isomiso2
    title           : Twas The Nightshift Before Christmas
    artist          : Adam Kay
    encoder         : Lavf58.12.100
    Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 131 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
    Stream #0:1: Video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 470x470 [SAR 1:1 DAR 1:1], q=2-31, 25 tbr, 12800 tbn, 25 tbc
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #1:0 -> #0:1 (copy)
Press [q] to stop, [?] for help
frame=    1 fps=0.0 q=-1.0 Lsize=   17304kB time=00:17:43.84 bitrate= 133.2kbits/s speed=3.38e+03x
video:12kB audio:17111kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.052526%

(Based on the Stack Overflow answer FFmpeg, how to embed cover art (image) to .m4a by Lukas.

leoncvlt commented 4 years ago

Nice! Since that's the case, why not just enable album image embedding by default? It makes the output file nicer and shouldn't impact scraping time since it's just an added parameter to the ffmpeg concat command that already has to run.

rocketinventor commented 4 years ago

Well, it does add a little bit of time, for sure, because the image has to be downloaded (unless there is a way to copy it over from the browser cache). Adding the album artwork to the file does add a little bit more time to the processing. The way I have it implemented right now is that it works as its own operation (before adding in the other metadata) - If it can be done at the same time as the other tagging operation then the time that difference would probably be quite negligible.

Code right now looks more or less like this:

# choose the correct input file to use for the tagging operation
  if (cover_img_file):
    tag_audio_input_file = img_audio_file
  else:
    tag_audio_input_file = combined_audio_file

### (abridged code) ###

# concat, image embed, and tagging operations
  concat_command = f"ffmpeg {silent} -f concat -safe 0 -i \"{files_list}\" -c copy \"{combined_audio_file}\""
  os.system(concat_command)
  if (cover_img_file):
    embed_img_command = f"ffmpeg {silent} -i \"{combined_audio_file}\" -i {cover_img_file} -map 0 -map 1 -c copy -disposition:v:0 attached_pic \"{img_audio_file}\""
    os.system(embed_img_command)
  tag_command = f"ffmpeg {silent} -i \"{tag_audio_input_file}\" -c copy -metadata title=\"{book_json['title']}\" -metadata artist=\"{book_json['author']}\" \"{tagged_audio_file}\""
  os.system(tag_command)

I personally don't have a need for album artwork (I use an MP3 player without a color screen), but it does make the output look nicer (and more official-looking). Many people might prefer this. I would still like to make it at least opt-out able. Also, some people might prefer to just have a cover.jpg file in the same directory (rather than an embedded file). One image file ~10 audio files per book 1000 books makes quite a difference is used space.

itsbrex commented 4 years ago

@rocketinventor Sorry I am just getting to this now! All these proposals look great tho.

I like the idea of having it be optional to embed it in the file, but also the option to save it as cover.jpg in the folder.

While it might add some extra time FFmpeg could compress the images too prior to embed, but they're already small enough so not sure it's necessary.

https://stackoverflow.com/questions/10225403/how-can-i-extract-a-good-quality-jpeg-image-from-an-h264-video-file-with-ffmpeg/10234065#10234065

rocketinventor commented 4 years ago

@itsBREX Ok, thanks for your feedback.

I don't have any plans on messing around with the image compression - it doesn't seem worthwhile.

What is your opinion on what the default behavior should be for image download/embed?

rocketinventor commented 4 years ago

@leoncvlt I figured out how to do the tagging and image embed at the same time - it is quite simple, actually. And the amount of time that it takes is quite negligible, so I think that we can make it default.

I rewrote the tagging part, again, to look like so:

  if (cover_img_file):
    cover_embed = f'-i "{cover_img_file}" -map 0 -map 1 -disposition:v:0 attached_pic'
  else:
    cover_embed = ""

  title_metadata = f"-metadata title=\"{book_json['title']}\""
  author_metadata = f"-metadata artist=\"{book_json['author']}\""
  category_metadata = f"-metadata album=\"{book_json['category']}\""
  genre_metadata = '-metadata genre="Blinkist"'

  tag_command = f'ffmpeg {silent} -i "{combined_audio_file}" {cover_embed} -c copy '
  tag_command += f'{title_metadata} {author_metadata} {category_metadata} {genre_metadata}'
  tag_command += f' "{tagged_audio_file}"'
  os.system(tag_command)

I added category and genre tags in addition to the image tags. Tagging sub-commands for the various tags have been moved into their own variables and the tag command itself has been split up into multiple lines (to roughly follow PEP8 specifications). I also switched between double/single quotes for each variable depending on which one had simpler syntax. FFmpeg doesn't care about extra spaces, so this solution works quite well, no matter what information we want to embed.

I also implemented a flag for using the local cover.jpg file in the HTML page. Since we are using the 1_1 type image, the dimensions are different now. Should we make this the default (downloaded 1x1 image), download both images (1x1 and 3x4), change which online image we use (to 1x1), or just leave it as is (with inconsistent size/shape between HTML files)?

The question, though, for both the HTML change and the audio embed change is if we should override the old files or not. Should there be a --override flag for them? There is no way to tell, anymore, what is going on with them, except for by parsing the output files.

Also, if there is already a cover.jpg file in the directory, but no --save-cover flag, should we delete it? For example, say we have the --embed-cover-art flag enabled (without --save-cover). This would indicate to 1) download cover.jpg, 2) embed it in the concat audio file, and finally: 3) delete cover.jpg. However, if there is already a cover.jpg file from before, then the default behavior would be to delete it. (Unless we add a step "0": detect and store if we have cover.jpg and change step "3" to "delete cover.jpg if it did not exist at step "0".)

@leoncvlt @itsBREX How do you think that should these concerns shoul be handled?

leoncvlt commented 4 years ago

Hey, thanks for looking into this! New ffmpeg command looks ace. Regarding the rest:

I think it's fine to use the 1_1 image in the html file as well for consistency purposes and making it the default - with the current html template, switching from 3_4 to 1_1 does not visually change anything.
I don't think we should worry about having to overwrite the "old" html / audio files or adding check cases - in the end this is more of a visual / convenience update, it doesn't really impact the functionality. I think it's enough to add a changelog / section on the readme, and if people want the html files using locally cached images or audio files with the cover art, they could just run the script again.
Let's just give the temporary downloaded cover art for embedding its own unique filename, like _coverart.jpg or something like that. That means it would be technically downloaded twice in the scenario where the user choose to embed the cover image as well as downloading it to the folder, but it won't cause any conflict with the other cover.jpg file when downloading / deleting it.

leoncvlt commented 4 years ago

Hey @rocketinventor, did you make any progress on this? If not I can give this a shot this weekend based on code snippets you shared earlier 😃

rocketinventor commented 4 years ago

@leoncvlt Yes, I did make progress. I did something similar to what you suggested with the _temp non-conflicting filename. But instead of downloading again, it will detect if the file exists already and just copy it. I have everything working for the audio file, and the HTML/EPUB (theoretically)/PDF generation. Making this part work smoothly with clean code is the step that I am at right now. I need to do some refactoring. I don't have so much time per day, and I have other modifications that I made locally for other purposes that I need to sort out...

As far as re-generating the HTML goes, there is no way to do this right now except to manually delete all the files that you want to be redone - running the script again for a book/books that were done already won't accomplish anything.

1x1 and 3x4 images do look slightly different, although one could argue that the 1x1 looks better. See the case study below.

Before:

After:

Note: on most covers, the difference is less pronounced - the only difference is that the artwork doesn't seem to be cut off.

leoncvlt commented 4 years ago

Implemented in #20 thanks to @rocketinventor's amazing work!