Better algorithm for "Clean up Blocks"

rachel-fenichel commented 6 years ago

Problem statement

The right-click menu on the workspace has a "Clean up Blocks" option. This currently arranges all of the blocks stacks vertically, with a bit of space between each stack, as shown below:

This ignores the fact that the workspace is a rectangle.

It would be nice to implement a cleanup algorithm that arranges blocks neatly in multiple columns if there's sufficient space.

j2kun commented 6 years ago

Sounds fun. Could you point me to where the existing implementation is?

NeilFraser commented 6 years ago

https://github.com/google/blockly/blob/master/core/workspace_svg.js#L1161

j2kun commented 6 years ago

I propose the following:

Given n block trees B_1, ..., B_n of widths w_1, ..., w_n and heights h_1, ..., h_n:

Let w_max = max(w_i), h_max = max(h_i)
Arrange the blocks in a grid of w_max*h_max size cells, starting from the top right of the workspace
The number of columns is the largest integer C such that C*w_max <= workspace.width, with the number of rows is ceiling(n/C)

One consequence is that if the workspace has one long block, then the result is a single column, and if the workspace is very wide, then there can be just one row and the user may need to do a bit of side-scrolling.

I am also open to the idea of a row-wise greedy packing (the result has no identifiable columns, but very compactly organized). My guess is simpler is better for modification and maintenance in the future and the above is IMO the simplest that achieves the stated goal.

Either way, LMK and I'll code it up.

j2kun commented 6 years ago

In second thought, it would also make sense to sort them in increasing order of width, leaving the longest blocks for the end since they can use all the extra space. Then I could see a simple column-order layout working nice. Basically, "first fit increasing" (or decreasing)

rachel-fenichel commented 6 years ago

Interesting. There's a constraint here that I didn't notice in the original issue description: the order of the blocks on the workspace may be important for code generation and execution. So the result of getTopBlocks(true) should be the same before and after cleanup, including in RTL*. More on that below.

For now, let's assume that we'll come up with a good way for a developer to decide whether to preserve ordering in cleanup, and carry on with algorithm design for something that does reorder. We'll use the old system when we need to preserve ordering.

Sorting purely by size seems like it would confuse users, especially kids. The new order of the block stacks would not be at all related to the old order of the block stacks, which ignores the fact that the block stacks have meanings and related stacks will often be near each other.

Can you think of a way to do this that shuffles blocks into columns and rows in a way that (mostly) minimizes the distance that they have to move?

Here's how getTopBlocks orders them:

/**
 * Finds the top-level blocks and returns them.  Blocks are optionally sorted
 * by position; top to bottom (with slight LTR or RTL bias).
 * @param {boolean} ordered Sort the list if true.
 * @return {!Array.<!Blockly.Block>} The top-level block objects.
 */
Blockly.Workspace.prototype.getTopBlocks = function(ordered) {
  // Copy the topBlocks_ list.
  var blocks = [].concat(this.topBlocks_);
  if (ordered && blocks.length > 1) {
    var offset = Math.sin(goog.math.toRadians(Blockly.Workspace.SCAN_ANGLE));
    if (this.RTL) {
      offset *= -1;
    }
    blocks.sort(function(a, b) {
      var aXY = a.getRelativeToSurfaceXY();
      var bXY = b.getRelativeToSurfaceXY();
      return (aXY.y + offset * aXY.x) - (bXY.y + offset * bXY.x);
    });
  }
  return blocks;
};

*I can help you evaluate whether a given algorithm will work properly in RTL.

j2kun commented 6 years ago

@rachel-fenichel I think there are many possibilities for this, and I could think a bit more about whether we could contort it to be a dynamic program. However, I think that if you want a good guarantee about "minimality" and flexibility in terms of constraints, a linear program (LP) is worth considering. There is a related problem called the "transportation problem" which I think could be adapted to a cleanup algorithm, which I wrote a blog post about here. In blockly, the "destination" would be grid positions, the "source" would be the set of current positions, and the LP solution would describe which blocks go to which grid positions in a way that minimizes distance traveled. Modulo a few issues relating to the block dimensions, this sounds like what you want.

An LP raises a few issues above my position to decide:

An LP solver is a nontrivial dependency. As an example to consider, here is one that I've not tried but seems to have reasonable benchmarks.
LPs have the potential to be slow, though I think a 0.5s - 1s budget would be a reasonable upper bound on a cleanup algorithm for order of 100 blocks (this is under the expectation no Javascript library comes close to the speed of an industry-strength compiled LP solver; in C an LP with a few hundred variables and constraints should be solvable in order of 100ms).
using an LP solver (a black box) is nontrivial technical debt, in the sense that if/when things go wrong, fixing it requires some expertise.

I think I could code up a simple prototype in python (more quickly than I could in JS) if you think that's a reasonable route, and I could throw some examples at it to see how it does.

Otherwise I can keep thinking about a heuristic solution that maintains nearness of blocks at the start and end.

RoboErikG commented 6 years ago

Regarding technical debt, we've been discussing adding a callback API for the workspace cleanup so that the default implementation could be replaced with a custom one. This would allow a method that doesn't follow all the constraints around ordering to be added as a demo for how to replace the default.

Thoughts?

j2kun commented 6 years ago

No objections, though this would not allow one to exclude the additional dependency, to the best of my knowledge.

ewpatton commented 6 years ago

@RoboErikG +1. Given that some App Inventor projects have on the order or 1k or on rare occasions 10k blocks, we'd prefer to have more control over the computational complexity of the algorithm.

rachel-fenichel commented 6 years ago

@j2kun We want to move forward with the LP solution in a demo, and with a new callback API as Erik mentioned. As I see it, the steps are:

Make a prototype and confirm that its solutions "feel" right from a user perspective.
Add a callback API
Add a demo, if the prototype works the way we want it to.

Since we're doing this as a demo, we can solve the additional dependency problem by only requiring that dependency in the index page for your demo. Adding a developer dependency is fine for us--we get worried about it when it's the core library that's rapidly increasing in size.

@ewpatton makes a good point, that the sizes of projects can easily be larger than 100 blocks. 1k is still a size that we consider reasonable. (10k doesn't sound reasonable to me, but people do it anyway.) If the operation takes too long we may want to add a spinner of some sort, but we can sort that out later.

BTW, sorry for the delay in responding--I've been out of office for a while.

j2kun commented 6 years ago

After some thought, it would be a waste of time to make a prototype in python. I'll do it with d3 and depend on the LP library in question. Would make for a good blog post if it works out :)

Will post here when I finish the prototype.

ewpatton commented 6 years ago

I should also mention that App Inventor has alternatives that were developed in parallel for arranging blocks horizontally and vertically in the workspace. You may want to consider adopting/modifying one of those. One enhancement I would suggest is that a priority be introduced for blocks. Many App Inventor users prefer variables to appear before other types of blocks, so being able to assign a block a priority in the sort would be useful for language developers. At the moment, the ordering is random based on how the browser iterates over the top-level blocks.

j2kun commented 5 years ago

I've let this fall off the wagon a bit, but it's in the back of my mind. I have been tinkering around with different formulations of the problem, and it seems that the "typical" optimization criteria for rectangle packing problems don't really align with blockly's goal. Specifically:

There is no real constraint on the size of the resulting layout, and it's not a very good objective target. Rather, the ideal layout relates to the conceptual layout of the blocks, which is in part related to the initial layout of the blocks before clicking "clean up."
There should be some sort of natural tie breaking that cannot be deferred to a linear solver, such as ordering variables first (as per ewpatton), or by the existing top-block ordering (as per rachel-fenichel).
It also doesn't quite make sense to minimize distance from the initial position to the final position (if you're making a linear model to do that) because you won't be able to enforce a column-like arrangement.

I think a better approach would be the following (much simpler):

Cluster the blocks (choosing # of clusters automatically)
Each cluster is its own column, ordered from left to right based on the x coordinates of the centers of the clusters
Sort each column according to a useful ordering (e.g., variables first, then sort by y-coordinate of the block within the cluster)

For example, in principle this should produce the following:

j2kun commented 5 years ago

Oh, um.. right. Is the thing I said in the last comment reasonable to try?

rachel-fenichel commented 5 years ago

@j2kun Yeah that looks interesting. Go for it, and let us know when you have a demo to play with.

j2kun commented 5 years ago

Try it out

You can drag and drop the rectangles and then click the button to run the cleanup algorithm. Code: https://github.com/j2kun/blockly-prototype-cleanup

I had to wrangle with some polyfill nonsense I don't understand, so there are a few unseemly hacks. The core of the work is in dendrogram.js and clustering.js, with cleanup_cluster.js showing how the columns are arranged and sorted (using width as a proxy)

cleanup_cluster_example

RoboErikG commented 5 years ago

Oooh, nice. Couple comments:

You might want to use Y-pos for sorting since that would be closer to the behavior we'll want in Blockly. I also found it confusing that they were being reordered after I lined them up vertically like blocks.
What's the runtime?

j2kun commented 5 years ago

I'll fix the sorting.

The runtime is Ɵ(n^2), where n is the number of blocks. This is also a lower bound for clustering algorithms. Considering the 10k blocks comment, we may want to do a simpler cleanup when the number of blocks exceeds a threshold (I can measure, but I expect this algorithm to be quite fast still for 1k blocks).

RoboErikG commented 5 years ago

Yeah, that sounds reasonable. I was mostly curious as I haven't used a clustering algorithm in about a decade. ;)

Since you'll only be sorting stacks of blocks we can probably get up to pretty big programs as well. The limit just needs to apply to the number of top blocks on the workspace.

j2kun commented 5 years ago

I updated the demo to sort each column by the starting position of the block (the block's center y).

http://j2kun.github.io/blockly-cleanup/

Sorry for the delay. I published a book last week so I had a lot of stuff to do around that.

RoboErikG commented 5 years ago

Congrats on your book launch. =)

The demo looks pretty good, though one additional thing I noticed is if you click the button multiple times it keeps resorting the blocks into different groups. You might want to make sure there's enough distance between sorted groups that they don't rearrange after they've been sorted.

j2kun commented 5 years ago

Yeah, that's a bit of a tricky edge case, and kind of gets at the heart of the failures of clustering methods that automatically choose the number of clusters. I.e. the clustering happens first, then it sorts by column. I could try to fine tune the algorithm, but it might just be easier to have a flag that says: if you clean up and then don't move anything and try to clean up again, it's a no-op. The minute you move a block, cleanup will operate again.

mattbishop commented 4 years ago

Any movement on this issue? I would like to supply my own cleanup algorithm without having to monkey-patch Blockly.WorkspaceSvg.

google / blockly

Better algorithm for "Clean up Blocks" #1734

Problem statement