jasondavies / d3-cloud

Create word clouds in JavaScript.
https://www.jasondavies.com/wordcloud/
Other
3.82k stars 1.08k forks source link

Missing words in final layout #159

Open nextensible opened 5 years ago

nextensible commented 5 years ago

This is a known issue:

Note: if a word cannot be placed in any of the positions attempted along the spiral, it is not included in the final word layout. This may be addressed in a future release.

How much work would it be (for you / for a dev new to the project) to implement this? Which approach would you suggest? Can you give some hints where to start? A simple option could be, as soon as a word cannot be placed, "zoom out" (e.g. by decreasing the font size) and restart the whole process – until all words can be placed. But I assume that you as the author would be able to come up with a better approach?

naholyr commented 4 years ago

It's not even as simple as decreasing the font because positioning is seasoned with randomness, which means you could have all words visible in a layout, and just a refresh later be missing a few ones… So decreasing font:

However, you can try to implement your own heuristic, the simpliest one being "try again until everything is visible":

// Before update, store expected number of words
const expected = words.length;

// Trigger redraw
layout.words(words);

// The drawing function
const draw = words => {
  // words = computed layout, it contains the *actually displayed* words
  if (words.length < expected) {
    // try again
    this.layout.stop();
    this.layout.start();
    return;
  }

  …
}

Nothing stops you from changing configuration (like decreasing font size) before calling layout.start() again, but I must admit I'm pretty lost in the good methods. I was thinking more about increasing layout size, and resize svg afterwards.

localpcguy commented 4 years ago

I ran into this bug, and effectively did what @naholyr suggests, I run through a loop reducing the font size (wrinkle, we had a font-size range, so needed to reduce sizes across the range). Then try to redraw and check expect number of works against drawn number of words. Set a maximum iteration length of 10 reductions, after which it just displays what it can at that point, so it doesn't reduce words to illegible sizes.

adrianhelvik commented 4 years ago

I solved this with binary search. There are still edge cases though. And another thing: Always use a seeded RNG. Predictable randomness is a must have for debuggability.

hiniestic commented 3 years ago

I run in the same errors, word missing from the chart, and I found that the issue happends when setting a padding value

adrianhelvik commented 3 years ago

I ended up developing a custom wordcloud algorithm instead.

consoleLogIt commented 2 years ago

hi @adrianhelvik I am having a similar issue and feel custom word-cloud algo is the solution, can you please share resources or anything of that sort to get started.

adrianhelvik commented 2 years ago

Hmm. I'm considering open sourcing it. But it could be a competitive advantage as it's quite frankly a lot better than what our competitors are offering, so I must talk to my supervisors before sharing it.

The algorithm is like this:

  1. Render sprites. Scale the words according to their weights first.
    1. Optimize width: Use ctx.measureText to get the width of the word.
    2. Optimize height: Use a larger height than you expect the word to have and remove empty rows from the pixel grid.
    3. Render word with some stroke (this will make some space between the words)
    4. Store non-white pixels in a Set
  2. Generate word cloud
    1. Pick a moderately small size to start building the cloud.
      1. Create a new Uint8ClampedArray(width * height) to store pixel values. You could use a Set for this too as you only need to store 0 for empty and 1 for occupied. I believe a Set would be better tbh.
    2. Position the sprites
      1. For each word:
        1. Start out with a small width/height (100/100 f.ex) for your coordinate possibility space
        2. Try to place the word at a random coordinate among the allowed coordinates
        3. If within the coordinate space, you fail more than ‹configurable› times, increase the size by ‹configurable› number
        4. If you fall outside the full cloud bounds, clone all pixels to the center of a new UintClampedArray(width * 1.2, height * 1.2) and resume positioning words. I believe negative indeces in a Set would be more performant though.
    3. Once the cloud is done (or even when a word is done), you have the x,y coordinates for each word and can render them into a canvas or an SVG.
adrianhelvik commented 2 years ago

And do not do it synchronously. Split things up using requestAnimationFrame, or even better use a web worker. I use events to emit placed words to the renderer, so I can display the cloud as it's being buit.

Also, I'd suggest trying to exclude the previously tested rectangle when placing a word. I don't it's the biggest performance bottleneck of the algorithm.