mapbox / carto

fast CSS-like map stylesheets
https://cartocss.readthedocs.io/
Apache License 2.0
652 stars 129 forks source link

Nested regex can cause extremely slow generation times and longer xml #213

Open smsm1 opened 11 years ago

smsm1 commented 11 years ago

As a particularly noticeable example is the TileMill example open-streets-dc where converting two sections with the widths to regex, or even just changing a few lines from = to =~ gives significantly different results. https://github.com/tilemill-project/tilemill/blob/master/examples/open-streets-dc/highway.mss#L119 is an example line to change to get the difference.

I'm using Carto 0.9.4.

tmcw commented 11 years ago

This is probably unavoidable.

We can make judgments about zoom level filters and other relative filters, like =, < and so on, so we can say that if you have rules that overrule each other we can eliminate lots of possible combinations.

We can't do that with regex filters unless we have some kind of zany meta-regex-evaluator which compares their DFAs and derives new ones that are combinations and exclusions of each other.

springmeyer commented 11 years ago

@smsm1 - would you mind pasting a few before/after of the Mapnik XML based on a given change?

smsm1 commented 11 years ago

Before: http://shaunmcdonald.me.uk/tmp/carto/open-streets-dc-orginal.xml After: http://shaunmcdonald.me.uk/tmp/carto/open-streets-dc-orginal-1.xml

Changed about line 119 from: [type='motorway'], [type='trunk'] {

To: [type=~'motorway|trunk'] {

springmeyer commented 11 years ago

thank you @smsm1

springmeyer commented 11 years ago

ugh, the diff is larger than either xml. And many of the differences seem not directly related to the change: http://cl.ly/code/0l1V2S0O0W2n. @tmcw - are you positive this bug is "unavoidable" ?

tmcw commented 11 years ago

@tmcw - are you positive this bug is "unavoidable" ?

I'm positive that the regex-makes-more-rules bug is unavoidable unless carto or mapnik changes its approach, but it sounds like you're seeing a different bug than that.

smsm1 commented 11 years ago

http://shaunmcdonald.me.uk/tmp/carto/open-streets-dc-orginal-2.xml is an even simpler change to (note the extra tilde)

[type=~'motorway'], [type='trunk'] { I wonder if having another syntax for the most common reasons for needing regexes currently may be an alternate solution?

springmeyer commented 11 years ago

It seems like the next step is to try to create a very small testcase (small stylesheet, only one layer...) and to use it to try to tease out if there are multiple bugs.

smsm1 commented 11 years ago

I've got a case where 4 levels of indentation (levels of filters) with no regex can cause generation of the mapnik xml to jump, even so it makes the document simpler to read with less duplication.

tmcw commented 11 years ago

@smsm1 can you create & post a testcase for this? the effect of the filter nesting is entirely dependent on the filters themselves.

stevage commented 10 years ago

Hmm, I've just started using regexes because they're so much more concise:

  [shop=~"bicycle|supermarket|convenience|general|bakery"],
  [amenity=~"cafe|toilets|restaurant|fast_food|shelter"],
  [tourism=~"viewpoint|attraction"]

This bug is a bit alarming. I'd agree with @smsm1's comment that maybe just syntax for the most important use case would be enough? For me, it would definitely be this one - I could very easily live without even wildcard matches, but the ability to concisely express a set of possible things to match against is very valuable.

Maybe even: [shop in 'bakery,convenience,general'] { } ? (Commas are more readable than pipes, but maybe there's some other reason to avoid them...)

nebulon42 commented 7 years ago

So far I'm not seeing problems with a minimal example like

#world {
    [type=~'motorway|trunk'] {
        [zoom=12] { line-width: 1.2 + 2; }
        [zoom=13] { line-width: 2 + 2; }
        [zoom=14] { line-width: 4 + 2; }
        [zoom=15] { line-width: 6 + 2; }
        [zoom=16] { line-width: 9 + 3; }
        [zoom=17] { line-width: 13 + 3; }
        [zoom>17] { line-width: 15 + 3; }
    }
}

Also

#motorways::case[zoom>=6][zoom<=12],
#mainroads::case[zoom>=10][zoom<=12],
#roads::case[zoom>=13][tunnel!=1][bridge!=1],
#tunnels::case[zoom>=13][tunnel=1],
#bridges::case[zoom>=13][bridge=1] {
    line-cap:round;
    line-join:round;
    line-width:0;

    [type=~'motorway|trunk'] {
        [zoom=12] { line-width: 1.2 + 2; }
        [zoom=13] { line-width: 2 + 2; }
        [zoom=14] { line-width: 4 + 2; }
        [zoom=15] { line-width: 6 + 2; }
        [zoom=16] { line-width: 9 + 3; }
        [zoom=17] { line-width: 13 + 3; }
        [zoom>17] { line-width: 15 + 3; }
    }
}

is ok.

But with the open-streets-dc style I see an increase in 3200 lines. Challenge seems to be to find a "minimal" example that is not too minimal.

nebulon42 commented 7 years ago

Right now I could not confirm that there is an interaction of bugs. Starting with https://github.com/tilemill-project/tilemill/blob/master/examples/open-streets-dc/highway.mss#L82-L160 and condensing it further seems to bring the number of lines closer together for the regex and non-regex case. Until it reaches an equal number of lines when there is only one sub selector.

Writing #world or https://github.com/tilemill-project/tilemill/blob/master/examples/open-streets-dc/highway.mss#L82-L86 made no difference. Seems to me like a combinatorial explosion problem.

@tmcw is probably right with what he wrote in https://github.com/mapbox/carto/issues/213#issuecomment-11342862.