mpv-player / mpv

🎥 Command line video player
https://mpv.io
Other
28.8k stars 2.93k forks source link

CPU cycles are massively wasted on playlist-pos change in large playlists #6162

Closed Arcitec closed 4 years ago

Arcitec commented 6 years ago

mpv version and platform

Mac.

mpv 9efb027 Copyright © 2000-2018 mpv/MPlayer/mplayer2 projects built on Sat Apr 14 00:14:40 CEST 2018 ffmpeg library versions: libavutil 56.13.100 libavcodec 58.17.100 libavformat 58.11.101 libswscale 5.0.102 libavfilter 7.15.100 libswresample 3.0.101 ffmpeg version: git-2018-04-13-37d46dc

Reproduction steps

  1. Create this tiny Lua script (I hastily threw together this code for the purpose of this ticket, but if someone reading this happens to actually want a proper, full-fledged randomizer for something, use Leapfrog instead):
function random_jump()
    local max = mp.get_property_number("playlist-count");
    local target = math.random(0, max - 1);
    mp.osd_message("Target: " .. target+1 .. "/" .. max)
    mp.set_property("playlist-pos", target);
end

mp.register_script_message("RandyJumper", random_jump)
  1. Bind the command in input.conf:
c script-message RandyJumper
  1. Create a folder with 1,000 test images and open it in mpv:
mkdir test && cd test
wget https://upload.wikimedia.org/wikipedia/commons/d/db/Patern_test.jpg -O 1.jpg
for i in {2..1000}; do cp 1.jpg "$i.jpg"; done
mpv .
  1. Press c to randomly jump to different images. This should be pretty much instant every time you press c. The new image will load, and your console (view the terminal you launched mpv from) will instantly show the new "Playing:" result for whatever file it jumped to.

  2. Create a bigger folder with 40,000 test images and open it in mpv:

mkdir bigtest && cd bigtest
wget https://upload.wikimedia.org/wikipedia/commons/d/db/Patern_test.jpg -O 1.jpg
for i in {2..40000}; do cp 1.jpg "$i.jpg"; done
mpv .
  1. Press c to randomly jump in this larger playlist. If your result is like mine, it will take about a second between each jump. The terminal you launched mpv from will show veeeery slow navigation between files. And the OSD will be equally sluggish at showing the new "Target: ..." script message.

The time it takes grows the larger the playlist is. So 80,000 images is about two seconds, etc. (And as a sidenote: If your computer is ultra-fast and doesn't have a problem at 40,000, then simply run the test again with more images, such as 100k instead!)

These results are also replicable by a mere v set playlist-pos 20000 command, so it has nothing to do with the scripting engine. Therefore it seems like there's a problem with setting the playlist-pos property, which causes massive slowdowns when there's a lot of files in the playlist.

Likewise, prev/next buttons in mpv's regular on-screen controls, and even "autoplay" (via pressing spacebar to "play" the images) are also affected by the playlist-size dependent slowdown.

Expected behavior

Why should it matter if a playlist contains 1 file, 1000 files or 40,000 files? To mpv, there technically shouldn't need to be any difference. It should be able to just look up the "Xth" entry of the playlist and proceed instantly regardless of whether the list has 10 or 100,000 entries.

A playlist is basically just a small array of filenames, and 40,000 entries is "nothing" for a computer.

But somehow, the larger playlists cause massive slowdowns ("1 second" is massive by CPU-standards).

So mpv is clearly wasting tons of CPU cycles doing "something" when playlists are large. The question is: What is mpv wasting its cycles on, and is it possible to fix that behavior?

haasn commented 6 years ago

Why should it matter if a playlist contains 1 file, 1000 files or 40,000 files? To mpv, there technically shouldn't need to be any difference. It should be able to just look up the "Xth" entry of the playlist and proceed instantly regardless of whether the list has 10 or 100,000 entries.

Actually, playlists are stored as linked lists in mpv. Linked lists have other advantages, such as being able to add/remove entries or concatenate playlists in O(1), but come at the cost of slow seeking (O(n)). An array like you are suggesting would have the opposite issue - appending, inserting, or removing playlist entries would become much slower for large lists as well.

But we could get a best-of-both worlds tradeoff here by using some smarter structure like a finger tree, which would allow us O(log n) access with amortized O(1) insert/delete/append/concatenate.

Patches welcome

haasn commented 6 years ago

NB. actually the main reason linked lists are slow are not so much because of the number of operations but because chasing down all those pointers (= indirect memory accesses = random access pattern = cache misses) is slow.

It's possible that even with O(n) insert/delete/concat we would be better off by using an array here, since an array copy is mostly sequential/linear memory access which probably has much better performance for the same number of elements. That could be simpler to try. The API in common/playlist.h is relatively straightforward, so a drop-in replacement based on arrays instead of linked lists shouldn't be too difficult to implement.

Arcitec commented 6 years ago

@haasn Thank you for your detailed answer and thoughts! Aaaah, so it's a linked list where every "get item at X/prev/next" operation has to follow the whole chain until it finds the required item. Ouch.

But on the other hand, arrays are super problematic unless we pre-allocate lots of leading/trailing spaces (such as 1000 leading + 1000 trailing) to have room to easily prepend/append data (the most common operations) without having to re-allocate the whole array each time a single entry is added. But even with pre-allocation of leading/trailing room, we'll still have to re-allocate (or do tricky work) whenever we insert or delete items at arbitrary places. And things like moving items etc isn't so trivial either.

However... if you did the test I showed above, you saw that playlists of up to ~1000 items are actually currently fast to navigate even with mpv's linked-list approach. It's only when the mpv playlist reaches the larger sizes that things become suuuuuuper sloooooow.

So, how feasible would it be to try to modify the existing linked-list code to chunk it into this simplified maxdepth = 1 finger-tree structure instead? I looked a bit at the existing code and it SEEMS like my idea here would be very easy to do with minimal changes to the existing code.

"master" linked list of linked lists (max-size per sublist: 1000)
->
  [list 1, size: 1000] --> first 1000 playlist entries
  [list 2, size: 733] --> next 733 entries
  [list 3, size: 940] --> next 940 entries
  [list 4, size: 834] --> final 834 entries

To lookup a fixed position such as "entry 2500", mpv would just have to do this:

target_pos = 2500;
current_ceiling = 0;
while (loop through "master" list of chunked linked lists) {
    current_ceiling += list_chunk.size;
    if (current_ceiling < target_pos) {
        continue;
    }

    // The playlist entry we want to look up exists within the current list.
    // So determine final offset within the current list chunk.
    list_chunk_target = target_pos - (current_ceiling - list_chunk.size);

    // <loop through list_chunk's linked playlist entries one by one,
    // counting up, until we get to the final offset, such as "entry 767 of this list">
    // <do something with that playlist entry>

    break;
}

So it would do this to lookup entry 2500:

That optimization completely takes care of the slow "lookup position 49000" requests. They'd very rapidly loop through a few dozen small sub-lists until getting to the list where the desired item exists.

As for how to ensure that each sublist is ALWAYS chunked properly (containing max 1000 items per list), each playlist_entry would need minimal modification, by simply adding a pointer to its parent list. So each playlist_entry would need its current prev, next (they already have those) sibling pointers, AND a new parent_list pointer.

Manipulating mpv's global playlist while enforcing sublist sizes would be simple. Just like this:

Here's a visualisation of it:

a

This sounds like a fun and pretty easy but interesting algorithmic challenge to code. There's just one problem: I am severely chronically sick and bedridden and in a very bad period without mental energy to code this (basically only awake for one hour per day).

Is anyone interested in trying this?

haasn commented 6 years ago

Could work. If we assume an operation on a linked list of size 1000 is fast, a two-layer approach like this would allow us to work with lists of size 1,000,000. At any rate, this is basically just a base 1000 tree. Instead of hard-coding an approach like this it might also be worth it to just use a generic tree structure with a tunable base and arbitrary depth, that way we can support playlists of any size (and it might be that e.g. base 100 is faster than base 1000 in practice).

Also regarding your comment on arrays, pre-allocating space is something the mpv array helpers already do: the algorithm used simply doubles the amount of space per resize. So in practice you only need O(log(n)) allocations for n elements, which is negligible.

Arcitec commented 6 years ago

@haasn Hi. It has taken me a few days to get back due to being bedridden and knocked out...

First, regarding the arrays: The allocations of memory to grow the array isn't the problem. The problem is every time the user inserts or deletes items in the middle of the playlist. Every time the playlist is modified anywhere in the middle, we'd have to loop through the whole array and shift every subsequent item around to a different array slot to maintain the "array index/offset always = that item's playlist offset position" correlation. That's going to be a huuuge bottleneck if someone has a large playlist and uses a live playlist editor such as my Blackbox tool (it's pretty popular; at https://github.com/VideoPlayerCode/mpv-tools).

Anyway, regarding linked lists: My laptop has a Core i7 (I7-620M) 2.66 GHz from 2010. It handles mpv's current linked lists of size ~1000 very easily (ie. the slowdown is imperceptible). Around 4000 items the slowdown starts getting gradually more and more noticeable as you go up from there, but it's still pretty fine. It's around 8-10k where it gets annoying. This means that 1000 linked list node traversals (which is perfect/fast on my terrible laptop) is safely at a level that's going to be "imperceptible/fast" on pretty much all machines that matter. And as future hardware speed grows, people's machines will just keep getting better at handling that fixed number.

That being said, you're right that something like "500" may be better in practice. It would ensure that any single segment can only contain 500 nodes, which means traversal to specific nodes after we've found the correct segment is always going to happen within 500 traversals!

Assuming that following 1000 linked-list referrals in a row is always fast, then a base-500-chunking would allow 250,000 playlist entries (500 chunks of 500 entries each) while always maintaining a guarantee that such a tree would never have to traverse more than 1000 linked list elements for any lookups (first up to 500 outer ones to find the correct segment (of the 250k playlist), and then up to 500 nodes within that segment).

If you're following this logic (I am having trouble expressing this clearly due to my fatigue), then yeah it looks like 500 is a perfect number to guarantee that linked list node lookups are practically never going to reach more than 1000 traversals..

Anyway, regarding a multi-level tree, I first began doing that in my test implementation, but it heavily complicates the algorithm (each update to a segment's node count would need to update its parents' count-summaries all the way up the chain) so I threw it out in favor of the design I sketched in the image instead.

Basically, here's the design: An "outer container" holds the parameters (pointers to the first and last segments, and the allowed max-size of each segment). Each Segment is an individual linked list, with pointers to its head and tail nodes, a counter for its length (how many nodes are in this segment), and links to the previous/next SIBLING segments. Lastly, each Node is just like a regular node and has links to its prev/next siblings (even across segment boundaries, which is a trick which allows very easy traversal of the whole linked list from node 1 to the last one; see the implementation of printAllNodes()!).

Without further ado, here's the JavaScript implementation. It is a prototype with plenty of comments. I implemented all segmentation, segment splitting and node insertion. Run it in a good browser (such as Chrome or Safari) and look at the Console to get a viewable debug output of the whole object tree. It all seems to work perfectly.

test.html:

<!DOCTYPE html>
<html>
    <head>
    </head>
    <body>
    <script src="test.js"></script>
    </body>
</html>

test.js:

function SegmentedLinkedList(maxLength) {
    // First and last Segments, for quick append/prepend.
    this.firstSegment = null;
    this.lastSegment = null;

    // Node count for this Segment, and max size for each Segment.
    // A max Segment size of 0 or lower means "no limit".
    this.maxLength = Number.isInteger(maxLength) ? maxLength : -1;
}

function Segment(parent) {
    // What SegmentedLinkedList this Segment belongs to.
    this.parent = parent;

    // Head and tail Nodes of this particular Segment.
    this.head = null;
    this.tail = null;

    // Number of nodes in this Segment.
    this.length = 0;

    // Adjacent Segments (if any).
    this.nextSegment = null;
    this.prevSegment = null;
}

function Node(value, next, prev, parent) {
    // What Segment this Node belongs to.
    this.parent = parent;

    // Adjacent Nodes. Will also refer to Nodes in adjacent Segments.
    this.next = next;
    this.prev = prev;

    // Value of the Node.
    this.value = value;
}

SegmentedLinkedList.prototype.getLastSegment = function() {
    if (!this.lastSegment) {
        this.firstSegment = new Segment(this);
        this.lastSegment = this.firstSegment;
    }

    return this.lastSegment;
}

SegmentedLinkedList.prototype.appendValue = function(value) {
    var segment = this.getLastSegment();
    const newNode = new Node(value, null, segment.tail, segment);
    if (segment.tail) segment.tail.next = newNode;
    else segment.head = newNode;
    segment.tail = newNode;
    segment._updateLength(1);
}

Segment.prototype._updateLength = function(change) {
    this.length += change;

    if (change < 0) {
        // Handle Deletion:
        if (this.length <= 0) {
            // TODO: Delete this empty Segment from its adjacent Segments.
        } else if (this.length <= 100) {
            // TODO: Do maintenance on deletion? Merge with previous or next Segment if space is available?
        }
    } else {
        // Handle Insertion:
        var maxLength = this.parent.maxLength;
        if (maxLength > 0 && this.length >= maxLength) {
            // Split in half by creating a new Segment and moving Nodes into it one by one.
            // NOTE: This algorithm hasn't been checked to see if it's 100% correct.
            var halfPoint = Math.floor(this.length / 2),
                newSegment = new Segment(this.parent),
                currentNode = this.head;
            newSegment.head = currentNode;
            for (var i = 0; i < halfPoint; ++i) {
                currentNode.parent = newSegment; // Change ownership of the Node.
                currentNode = currentNode.next;
            }
            newSegment.tail = currentNode.prev; // Previous node is the last one we moved into the new Segment.
            newSegment.length = halfPoint;
            this.head = currentNode; // Our current Segment now starts at currentNode instead.
            this.length = this.length - halfPoint;

            // Update the adjacency links between Segments.
            if (this.prevSegment) {
                this.prevSegment.nextSegment = newSegment;
            }
            newSegment.prevSegment = this.prevSegment;
            newSegment.nextSegment = this;
            this.prevSegment = newSegment;

            // Update the SegmentedLinkedList's "first Segment" reference if necessary, since we've split ourselves.
            if (this.parent.firstSegment === this) {
                this.parent.firstSegment = newSegment;
            }

            // Debug output.
            console.log("splitting, first half:", newSegment.head, newSegment.tail, newSegment.length, "second half:", this.head, this.tail, this.length);
        }
    }
}

SegmentedLinkedList.prototype.getNodeCount = function() {
    var count = 0,
        currentSegment = this.firstSegment;
    while (currentSegment) {
        count += currentSegment.length;
        currentSegment = currentSegment.nextSegment;
    }

    return count;
}

SegmentedLinkedList.prototype.getNodeAtPos = function(targetPos) {
    var currentCeiling = 0,
        node = null,
        currentSegment = this.firstSegment;
    searchLoop: { // Label for break-statement.
        while (currentSegment) {
            currentCeiling += currentSegment.length;
            if (currentCeiling >= targetPos) {
                // The Node we want exists within the current Segment,
                // so determine its final offset within this Segment.
                var nodeOffset = targetPos - (currentCeiling - currentSegment.length);

                // Loop through all Nodes of this Segment until we find the offset.
                var currentNode = currentSegment.head,
                    currentOffset = 0;
                while (currentNode) {
                    ++currentOffset;
                    if (currentOffset == nodeOffset) {
                        node = currentNode;
                        break searchLoop; // Exit all loops.
                    }

                    currentNode = currentNode.next;
                }

                break; // Only happens if node offset wasn't reachable.
            }
            currentSegment = currentSegment.nextSegment;
        }
    }

    return node;
}

SegmentedLinkedList.prototype.printAllNodes = function() {
    if (!this.firstSegment) return;
    var node = this.firstSegment.head;
    while (node) {
        console.log("print all nodes: "+node.value);
        // Since all Nodes are linked to their prev/next siblings even
        // across Segment boundaries, we don't have to do anything special
        // to loop through all of them!
        node = node.next;
    }
}

/* DEMO CODE: */

var maxSegmentSize = 500, // Tweak this to test different maximum Segment sizes. TIP: Try weird values such as 273 and ensure that "getNodeAtPos() and getNodeCount() still work below!
    demoNodeCount = 1500; // How many Nodes to create for this demonstration.

var playlist = new SegmentedLinkedList(maxSegmentSize);
for (var i = 1; i <= demoNodeCount; ++i) {
    playlist.appendValue(i); // Will be auto-split into Segments whenever one reaches the max length.
}

console.log("final structure:", playlist); // Shows the whole linked structure.
console.log("node count:", playlist.getNodeCount());
console.log("node 573:", playlist.getNodeAtPos(573).value, playlist.getNodeAtPos(573));
//playlist.printAllNodes(); // Uncomment to show the entire chain of Node contents.

I did this over the past 4 days, a little bit at a time, just because I was intrigued by the challenge. But my health prevents me from writing the final C code and implementing the other functions (node deletion, _updateLength(-1) (to remove 1 node from the total length), etc. It's all pretty trivial but I am just so incredibly sick and can't do any more than this.

Either way, since I made the code public, I hope that this prototype/algorithm inspires someone here if somebody ever has the time and inclination to fix mpv's playlist structure. I know that other, very serious bugs get much higher priority. Chances are that this ticket will sit for years... unless someone suddenly decides they'd like to take on the algorithmic puzzle of this particular challenge for fun... I'd help fix the C code myself if I wasn't so sick... but everything is a huge challenge with my crushed energy levels. I haven't coded since June, and my health is now so bad that it looks like I'll never be able to get back to it... :-\

haasn commented 6 years ago

Anyway, regarding a multi-level tree, I first began doing that in my test implementation, but it heavily complicates the algorithm (each update to a segment's node count would need to update its parents' count-summaries all the way up the chain) so I threw it out in favor of the design I sketched in the image instead.

Yes, that's the general approach - but it doesn't have to be any more complicated than your two-layer suggestion: If you have the algorithm for one step, you can apply it recursively (or iteratively).

What you're trying to implement is essentially a known data structure in computer science. Specifically, we'd most likely be interested in a B-tree. Implementations thereof in C are plentiful and sometimes deceptively simple.

Arcitec commented 6 years ago

@haasn Indeed, recursion or iteration makes it easy to update a multi-level tree (that's how I did it before I scrapped that design), but I figured it was overkill to have to recursively traverse the segment-chain every time a node is inserted/deleted/moved.

I'm surprised that this is a known data structure, though. It would be great if there exists a finished, bug-tested library to just use!

Edit: I found Skip Lists (ref 1 and ref 2), which seems very similar but probably has worse performance than my idea. It divides a linked list into layers which allow you to jump rapidly in chunks of dozens of nodes at a time. This would definitely be a speedup over mpv's current system, but since it can't jump in terms of thousands of nodes at a time, it might not be as fast as my idea. I also have no idea how it handles deletion/insertion in the middle of the list (ie if you insert a node at point 28 in a list of 40 nodes, would you then still be able to say "get node 30" and actually get the correct node that's been moved from 29 to 30 by the insertion? That capability is vital for playlist usage.

Realistically, since I'm very sick, I can't contribute any C code. Which feels terrible. I'm used to coding things and contributing to open source, without asking anyone else to help out. :-\ And I know you're all very busy with much more important things. But at least this ticket exists now with a good summary of the problem...

zc62 commented 5 years ago

In C, one can just use tsearch in the standard library to create a binary search tree.

https://github.com/mpv-player/mpv/commit/89a17bcda6c166e98861723b8adc9989f2724c34 shows that the playlist data structure was a tree. It was then deprecated and turned into a linked list. IMHO it is reasonable, because in most cases people care about "play the next entry in the playlist", instead of the random access example you have shown.

It needs some extra effort to "go to the next entry" in a binary search tree, especially when using the provided library functions rather than implementing the tree on one's own, since one has to use the twalk function to traverse the tree, or use tfind to search for an entry.

If there are real-life examples that require random access in a playlist, and if the new implementation will not harm the performance of the simple "play next" action, then worth trying.

Arcitec commented 5 years ago

Hi @zc62!

Actually, the current implemention, with the linked list, is what suffers at "play next/go to the next entry".

Every "play next/previous" command tells mpv "go to playlist entry (currentEntry) minus/plus 1", which triggers a totally new Linked List traversal from the root node each time.

So if you're at entry 75000, and you go to 75001, then mpv has to search from 1 to 75001, to execute that command. This puts a maaaassive delay between the user's "next/previous" command and actually switching the file.

You can verify this by running my test-command which generates tens of thousands of test files, and then going to the middle of that list, and issuing a next/previous command.

If mpv was only slow at random jumping in a linked list, that would have been kinda fine. But it's equally slow at previous/next jumps, which makes mpv a sluggish mess on large playlists. :-\

Also: I'm not sure a binary tree would help. The problem with linked lists is that the various nodes are not in the CPU cache so they are super slow at being retrieved. A binary tree uses randomly malloc() nodes too... but perhaps it would still be faster if the tree can somehow be designed to let us quickly traverse it to reach a specific node number.

zc62 commented 5 years ago

Probably either you or me misunderstood the code.

It looks like here the "play next" action is as simple as return p->next in a linked list: https://github.com/mpv-player/mpv/blob/3dd59dbed06a55eed00ad68d0a953f39188e3647/common/playlist.c#L183-L192

It is true that the search code is looping over the entire list: https://github.com/mpv-player/mpv/blob/3dd59dbed06a55eed00ad68d0a953f39188e3647/common/playlist.c#L268-L276

I tried your example, and increased the test size to 400k files. Yeah I do notice the "play next/prev" action is also slowed down. But to me it looks like the bottleneck is somewhere else. The playlist code nearly instantly determined which file should be played (since a 400k for loop is still quite tiny, isn't it?), while there was some latency could be felt when loading the file. (The terminal shows (+) Video --vid=1 (mjpeg 1.000fps) a bit later than Playing: ./394314.jpg, for example.)

I would guess this is an operating system filesystem I/O issue that mpv cannot really do anything about. I have this feeling because even 400k files give me the latency shorter than 1 second. (I increased 40k to 400k just because I didn't feel anything wrong then there are 40k files.) I am using a SSD, probably you are not? Give it a try.

The I/O issue can be further verified without using any playlist related code. Make sure you don't have/disable autoload.lua, then just type

mpv ./1000.jpg

in a folder having 1000 files, and

mpv ./40000.jpg

in the 40,000 files folder. Carefully notice the difference in the speed when the line

 (+) Video --vid=1 (mjpeg 1.000fps)

pops up. I do see a difference. (Probably you can record the screen and count the number of frames.)

haasn commented 5 years ago

@zc62 if you have enough RAM, you can mount a tmpfs and move the files there for testing. That eliminates I/O completely from the signal path.

zc62 commented 5 years ago

By default Arch mounts /tmp as a tmpfs that uses half of the memory. I have only 8GB RAM so I can only try up to 100k files. It seems that using a tmpfs cannot really 100% eliminate the I/O overhead. The 100k-file test is still slightly slower than a 1k-file test, but as far as what I can tell, the speed is high enough, and I can feel the improvement by using a faster "disk". My CPU is just a laptop i5-6200U.

It would be better if @VideoPlayerCode could do a tmpfs test to see if the bottleneck is in the I/O, or the problem persists even if a tmpfs is used.

ghost commented 4 years ago

I am now considering turning the playlist back to a playtree. Anyway, I think this issue was fixed when I changed the linked list to an array. Might still not be "ideal", but probably better.