lisamelton / video_transcoding

Tools to transcode, inspect and convert videos.
MIT License
2.39k stars 160 forks source link

Proposal to create "sparse" .log files #213

Closed lisamelton closed 6 years ago

lisamelton commented 6 years ago

Proposal to create "sparse" .log files

My transcode-video tool creates a .log file by default. That file contains all the text sent to to the console by HandBrakeCLI.

While you can mine a .log file for useful information using my query-handbrake-log tool, that .log file is way too large considering how little data it actually contains. Why is that?

While transcoding, HandBrakeCLI will overwrite lines sent to the console which contain progress information, usually those with a completion percentage and ETA. HandBrakeCLI overwrites those lines by sending a carriage return to the console in order to move the "cursor" back to the beginning to the line.

This is a Good Thing(TM) and it's the way all command line tools which print progress information to the console are supposed to be behave. We do not want the HandBrake development team to change that behavior. Ever.

However, all that overwritten progress information is captured within the .log file created by transcode-video. So, I propose that we do change that behavior since removing such information will not affect results from query-handbrake-log.

How much smaller would a .log file become without that overwritten progress information? Usually an order of magnitude (i.e. 10x) smaller. Often much smaller than that. And not only does a smaller .log file save storage space, it speeds up query-handbrake-log significantly.

Since this is still a proposal, you can measure the space savings yourself by converting an existing .log file now using this script:

https://gist.github.com/donmelton/f0c218be392bf8ad49b75737870c9473

Download the script and make sure it's executable and in your $PATH. The script doesn't modify the .log file used as input, so just redirect output from the script to a new file like this:

prune-handbrake-log.sh "/path/to/original.log" >"sparse.log"

While it should work on Windows as long as you have Bash and Ruby installed, I've only tested it on macOS. Hopefully @JMoVS and @samhutchins, our Windows experts, will let me know if I've done something stupid.

As for the implementation in transcode-video, it's possible I'll implement this as a post-processing step. In other words, I'll spool the console output to a temporary file (much like I do now) and then convert that to a sparse .log at the end. Doing it that way won't change the current processing loop as much.

Let me know what you think. Thanks.

samhutchins commented 6 years ago

Works through Bash on Ubuntu on Windows, log file went from 140kb to 71kb.

It doesn't remove the vbv underflow warnings or the sync: "Chapter 24" ... lines, and I was expecting it to. Which is not to say that what I was expecting is the correct behaviour of course, although I would think it would be nice to get rid of them if they're not needed by query-handbrake-log. Getting rid of all those in my "Prometheus" encode reduces the log size to 17kb.

Once again though, you're improving a feature I don't use; I've always had --no-log in my my-transcode-video alias

lisamelton commented 6 years ago

@samhutchins Thanks for testing that so quickly!

Did you try that on a .log file from a full transcode? The reason I ask is that you won't see as much space saving if you only do a partial transcode, e.g. a single chapter.

Removing the VBV underflow warnings cleanly is problematic because HandBrakeCLI inserts them in unexpected places. And you can remove them yourself with:

cat "/path/to/original.log" | sed '/VBV underflow/d' >"edited.log"

But you'll notice that the edited.log file doesn't look quite right. And editing them away inline is even worse.

My goal was not to remove the VBV underflow warnings anyway. Those are useful to convince users that they should try ABR ratecontrol instead. :)

Instead, I want to make it less of a burden for users to keep their .log files around and not be tempted to use the --no-log option.

samhutchins commented 6 years ago

Yeah, that was a log from a full transcode of Prometheus with the default ratecontrol system: transcode-video --veryquick --crop 140:140:0:0 Prometheus_t00.mkv. It produces a lot of vbv underflow warnings :-P

I think keeping log files around is less a question of storage space, we are creating multi-gigabyte video files after all, and more a question of file management. I've never bothered to keep them because I don't feel like they provide much value to me and it's not worth the effort of organising them. But then again, I'm not constantly re-transcoding 200+ blu ray rips...

lisamelton commented 6 years ago

@samhutchins Hmmmm, then I'm concerned that the script isn't working correctly on Windows. Can you examine the file with a text editor to make sure all the overwritten progress information has been removed? Thanks!

And it's 700+ Blu-ray rips that I keep re-transcoding. :)

samhutchins commented 6 years ago

I can't think of a reason for it not to work on Windows. Given that it's Ruby doing all the heavy lifting I'd expect the behavour to be identical

My shrunk.log has just over 700 lines like x264 [warning]: VBV underflow (frame 106330, -24440 bits)

I've attached the before and after

samhutchins commented 6 years ago

Also, because I don't want to be overly negative, I think this would be a good change to incorporate. Making a log smaller without losing relevant information is absoluely worth doing!

lisamelton commented 6 years ago

@samhutchins Thanks for attaching your .log files! I'll look at them shortly.

So, I just finished transcoding my copy of "Prometheus (2012)" with your same settings on macOS. Here are my before and after sizes: 751002 bytes (or 736 Kb) shrunken down to 89521 bytes (or 88 Kb). That's ~12% of the original size. Not quite an order of magnitude but, as you say, it contains a lot of VBV underflow warnings.

But it's much more compression than what you're seeing. And I have no explanation for that.

Also, thanks for the nod of approval on the change!

samhutchins commented 6 years ago

I may have misread file sizes. Can’t check again now, and I’m probably not gonna be in front of my transcoding machine until Saturday now I’m afraid.

lisamelton commented 6 years ago

@samhutchins Thank god you attached your .log files then! :) You're right, you must have misread the files sizes. I'm getting 430797 bytes (or 424 Kb) shrunk down to 73478 bytes (or 72 Kb), which is ~17% of the original size. Whew!

And I just verified that all the overwritten progress information has been removed. So we're good! Enjoy your Friday, sir. :)

samhutchins commented 6 years ago

I am (sometimes) a forward thinking man. Let me know if there’s anything else to test

lisamelton commented 6 years ago

@samhutchins Will do, sir! Once I actually change the code in the source tree, that's when I'll really need testing by you and @JMoVS. :)

JMoVS commented 6 years ago

I've been following this thread loosely, am fighting currently a windows vm, but am very glad that we got Sam being able to quickly test things. ;-)

I think the key is the reduction of redundant lines, not the fact that it saves disk space (seriously, when you have space for 600+ raw BD mkvs, half a megabyte per transcode is the least of your concerns, especially if you use a filesystem like ZFS to use block level compression with lz4. I think this is really helpful for people looking at the log themselves.

lisamelton commented 6 years ago

@JMoVS Yes, the reduction of redundant information really improves performance when you have to read those files.

But for me, the savings in space is also handy since I keep all my .log files on Dropbox, and that's a finite resource. Especially when you consider how many years I've been accumulating them. :)

elliotclowes commented 6 years ago

As someone who often quickly opens .log files in a text editor rather than running query-handbrake-log I highly approve of this idea.

lisamelton commented 6 years ago

@elliotclowes Thanks! Yeah, that's another use case I forgot to mention. You and me both, sir. :)

lisamelton commented 6 years ago

OK, I finally implemented this and checked it in. Sorry it took so long, folks!

But I don't want to release this (and other changes) until @JMoVS and @samhutchins, our Windows experts, once again let me know if I've done something stupid.

Here's the checkin:

https://github.com/donmelton/video_transcoding/commit/306e21136eaebc07f2f4af16aa31d1ccd0d45588

As you can see, I implemented it as a post-processing step. It turns out that's actually safer, albeit a bit slower.

samhutchins commented 6 years ago

@donmelton I'll try it out tonight and let you know

lisamelton commented 6 years ago

@samhutchins Thanks!

samhutchins commented 6 years ago

@donmelton Sorry this took so long to test

Works for me on Windows :-)

JMoVS commented 6 years ago

Hi @donmelton and @samhutchins , I can't test it right now but as sam tested it, I'd say as safe as it gets!

lisamelton commented 6 years ago

@samhutchins and @JMoVS Thanks! I'll check all this in once I'm back in front of a development machine. :)

lisamelton commented 6 years ago

D'oh! I completely forgot to close this issue when I released version 0.21.0 on Friday. :)

Just gem update video_transcoding, folks!