redruin1 / factorio-draftsman

A complete, well-tested, and up-to-date module to manipulate Factorio blueprint strings. Compatible with mods.
MIT License
94 stars 17 forks source link

Constant combinator set_signal function is very slow (50% of the execution time on big blueprints with lots of combinators) #28

Open louga31 opened 2 years ago

louga31 commented 2 years ago

I'm doing some profiling on the project to find why it takes so much time to create my big blueprints, and I found that the set_signal function is very slow because of the check for signal validity. I did not found what makes this check so slow, so I'm posting this as an issue instead of fixing it directly.

The check for signal validity takes an enormous 24-25s out of the 28s the function takes to run (87%) image

When trying to find the slow part of it, I checked the normalize_signal_id function, but it does not seem to be responsible for the slowdown as it took only 117 ms to run. image

redruin1 commented 2 years ago

This is somewhat unsurprising, as I had a hunch that validating the signal every time would get slow quickly, though I'm surprised that it says the signal_dict function is responsible for the majority of the time taken, and not something like Schema itself. Is there a minimum reproducible example to illustrate the volume/type of signals you're setting?

redruin1 commented 2 years ago

After coming back to this after a while, this seems to be mostly at fault of Schema's validate function. Given a minimal testbed:

from draftsman.entity import ConstantCombinator

entity = ConstantCombinator()
n_iter = 1000000
for i in range(n_iter):
    entity.set_signal(i % entity.item_slot_count, "signal-A", 100)

The current culprit appears to be this line:

signal = signatures.SIGNAL_ID_OR_NONE.validate(signal)

Running the above script with that line in set_signal results in a runtime of ~127 seconds for the 1,000,000 iterations. If this line is swapped with this (almost) equivalent statement here:

signal = signals.signal_dict(signal) if signal is not None else None

This finishes the same script in ~14 seconds, or about a 90% decrease in runtime.

Given that this is primarily a Schema issue, I'm not sure what the best precedent to take moving forward should be. Draftsman was always designed from an "API first, performance secondary" philosophy, as due to the number of safety checks that Draftsman performs it will never be as fast as an optimized, prototyped routine. This case is particularly ridiculous, so I might investigate more performant data validation libraries, or consider writing one myself; another particular slow point in the code seems to be the evaluation of the CONTROL_BEHAVIOR schema.

Note that if performance is what you're really after, then there's nothing stopping you from manually manipulating each underlying attribute individually; though this gets rid of all safety/sanity checks that Draftsman enforces:

from draftsman.entity import ConstantCombinator

entity = ConstantCombinator()

n_iter = 1000000
entity._control_behavior["filters"] = [None] * entity.item_slot_count
for i in range(n_iter):
    # entity.set_signal(i % entity.item_slot_count, "signal-A", 100)
    index = i % entity.item_slot_count
    entity._control_behavior["filters"][index] = {
        "index": index + 1, 
        "signal": {"name": "signal-A", "type": "virtual"},
        "count": 100
    }

This finishes the same 1 million iterations in just under a second, at the cost of readability.

louga31 commented 2 years ago

Wonderful findings, I'm not that much after performance (I prefer a safe library and that's why I choose Draftsman). The problem is that I have massive blueprints that take an insane amount of time to process (my two main projects are an automated base and a computer, and they take around 30 minutes to process). I'm starting to see any way I can globally optimize them to a reasonable amount of time. I might start manipulating the underlying attribute like you showed in your example, but only after I have testes with all the safety (I may start working on some sort of conversion program, as manually doing it everywhere is very error prone).