ultimate-pa / ultimate

The Ultimate program analysis framework.
https://ultimate-pa.org/
192 stars 40 forks source link

Slow Petri Net Unfolding (the Bumblebee Observations) #448

Open Heizmann opened 4 years ago

Heizmann commented 4 years ago

The Petri net unfolding is at the moment (early Oct 2019) the major bottleneck of the Petri Automizer verifier. The Petri net unfolding seems to be unnecessarily slow.

The following list of observations should help us to discuss improvements.

The Dinosaur Observations:

The Bumblebee Observations:

Heizmann commented 4 years ago
Heizmann commented 4 years ago

For benchmarks on the set above, CONDITION_PER_PLACE_MAX is between 90 and 25000, and CO_RELATION_MAX_DEGREE is between 3000 and 70000. CsvAutomataOperationStatistics107285100_2019-10-13_14-33-48-842.csv.txt java.hprof.txt The total runtime of the subTwoHourMarathon subset is 763s.

Heizmann commented 4 years ago

For the benchmarks in the set above: https://github.com/ultimate-pa/ultimate/commit/ae0a586ed02fa02de0cdf0cc5c54bfa94b436380 sped up the unfolding for larger benchmarks by factor 3 and reduced co-relation queries by up to 40%. (@naouarmehdi unfortunately, a side-effect is that benchmarks done before and after this commit are not comparable wrt. number of co-relation queries.) CsvAutomataOperationStatistics107285100_2019-10-13_14-57-02-088.csv.txt java.hprof.txt The total runtime of the subTwoHourMarathon subset is 310s.

Heizmann commented 4 years ago
Heizmann commented 4 years ago
Heizmann commented 4 years ago

Optimization mentioned in B07 reduced runtime on the subTwoHourMarathon to 250s but increased rumtime on some of the other four benchmarks. CsvAutomataOperationStatistics589166341_2019-10-20_03-48-32-850.csv.txt java.hprof.txt

Heizmann commented 4 years ago
Heizmann commented 4 years ago
Heizmann commented 4 years ago

The optimization mentioned in B17 reduced the runtime on the subTwoHourMarathon set to 175s. (!) CsvAutomataOperationStatistics589166341_2019-10-26_03-48-44-805.csv.txt java.hprof.txt

Maximal size of possible extensions reaches from 147 to 2969 on the benchmark set.

naouarmehdi commented 4 years ago
Heizmann commented 4 years ago
Heizmann commented 4 years ago

The Optimization mentioned in B18 reduced the runtime on the subTwoHourMarathon set to 143s. CsvAutomataOperationStatistics1957175487_2019-11-16_02-32-34-977.csv.txt java.hprof.txt

Heizmann commented 4 years ago

An optimization not discussed here, related to commit https://github.com/ultimate-pa/ultimate/commit/192202ccb26caed483646b4455aa7b50d1a2efd7 reduced the runtime on the subTwoHourMarathon set to 135s. CsvAutomataOperationStatistics1957175487_2019-11-18_04-38-23-835.csv.txt java.hprof.txt BeforeAfterInterleaved.csv.txt

Heizmann commented 4 years ago
Heizmann commented 4 years ago
Heizmann commented 4 years ago

I accidentally run the last benchmark without the firstborn cutoff check. If this check is also enabled the runtime is only 96s. CsvAutomataOperationStatistics1513124396_2019-12-18_01-31-41-453.csv.txt According to the hprof output, removeMin is now probably the most time-consuming method.

Heizmann commented 4 years ago

An optimization that I call the candidate preselection optimization: Let c be a condition and p its corresponding place. Instead of asking p for successor transitions, we first compute the set P of all places that for each place p' in P there is a condition c' that is in co-relation with c and whose place is p'. Now we do not ask for all successor transitions of p but for successor transitions of p wose preset is a subset of P.

naouarmehdi commented 4 years ago
Heizmann commented 4 years ago

This definitely makes sense. Maybe we can define a new total order as an alternative to the EsparzaRömerVogler order ≺_E from Section 5 of the paper. We could define χ (or ψ) as an alternative to φ. We define χ as a map from natural numbers to the sets of events whose depth is that number. Maybe let us call this order the depth-based order. Using the map χ, there are probably several possibilities to define the depth-based order such that it is total and adequate (see Definition 4.5). We can probably find a definition where we do not have to compute minima of suffixes of local configurations explicitly, but where we can just iterate over χ. We can implement χ in Ultimate as a TreeRelation.

Heizmann commented 4 years ago

The optimization 4b7a06995808403b91d8be427d953a6b1a496e48 reduced the runtime on the subTwoHourMarathon set to 52s. (Great! I had not expected that a reduction of the runtime by factor 2 is still possible.) CsvAutomataOperationStatistics1513124396_2019-12-19_23-08-43-238.csv.txt java.hprof.txt

Heizmann commented 4 years ago

Quick evaluation of bfbf37b after the bugfix def68b0. Runtime was reduced to 45s. CsvAutomataOperationStatistics1513124396_2019-12-22_00-03-22-204.csv.txt java.hprof.txt I compared the last two CSVs more carefully. The candidate preselection reduced the number of EXTENSION_CANDIDATES_TOTAL by around factor 10 and the number of EXTENSION_CANDIDATES_USELESS is close to 0. Most other numbers are unchanged but the NUMBER_EVENT_COMPARISONS is slightly different (difference is less than 1%). WithoutAndWithCandidatePreselectionInterleaved.csv.txt I do not have an explanation for this. Maybe events are added in a slightly different order.

naouarmehdi commented 4 years ago
Heizmann commented 4 years ago

Some obersevations of Mehdi.

Heizmann commented 4 years ago
Heizmann commented 4 years ago
Heizmann commented 4 years ago

Evaluation of the new (final) christmas benchmark set for the code of ab37154. Runtime 397s. CsvAutomataOperationStatistics781527109_2019-12-26_22-29-42-686.csv.txt java.hprof.txt

Heizmann commented 4 years ago

4336eb1 reduce the runtime on the Christmas benchmark set to 256s. CsvAutomataOperationStatistics781527109_2019-12-26_22-47-48-939.csv.txt java.hprof.txt Streaming co-related conditions seems to be the most time-consuming operation.

Heizmann commented 4 years ago

The ERV2 order would reduce the runtime on the Christmas benchmark set to 227s. CsvAutomataOperationStatistics781527109_2019-12-26_22-58-08-197.csv.txt java.hprof.txt

Heizmann commented 4 years ago

The DBO order would reduce the runtime on the Christmas benchmark set to 242s. CsvAutomataOperationStatistics781527109_2019-12-26_23-42-22-868.csv.txt java.hprof.txt

Heizmann commented 4 years ago

If we would not add cut-off events to the BranchingProcess this would reduce the runtime on the Christmas benchmark set to 131s. (!) CsvAutomataOperationStatistics781527109_2019-12-27_00-11-24-371.csv.txt java.hprof.txt However, before we enable this by default we have to discuss with @maul-esel if the large block encoding requires cut-off events (if this is the case we have to add a parameter to the PetriNetUnfolder, otherwise we will just change the value of the boolean flag).

naouarmehdi commented 4 years ago
Heizmann commented 4 years ago

The optimization mentioned in B32 reduced the runtime on the Christmas benchmark set from 256s to 65s. (I double checked that the runtime without the optimization is really 256s and the speedup is not caused by something else.) CsvAutomataOperationStatistics781527109_2020-01-04_02-32-39-141.csv.txt java.hprof.txt

This is (absolutely great but) a surprise to me. Maybe there is a problem with our implementation of the HashRelation3.

Heizmann commented 4 years ago

I added a testsuite 1a33ac41417d8a76c3308360c02d5568c1c3d653 for evaluating the impact of the here discussed improvements on our program verification. I am unsure how we should organize these informations (separate issue, here, something else). By now, I will just call related observations *Lyx Observations" and post them here.

Heizmann commented 4 years ago
Heizmann commented 4 years ago

I added new benchmarks for the evaluation of the unfolding. These benchmarks mimic what is needed during program verification. The unfolding is not run by the operation FinitePrefix but by the operation DifferencePairwiseOnDemand, hence we have to use the DifferencePetriNwaBenchmark testsuite. Statistics about the unfolding are added to the statistics of the DifferencePairwiseOnDemand operation.

Heizmann commented 4 years ago
Heizmann commented 4 years ago
* B32: A side effect of B07 is that the update method of the correlation class has very high costs. (because of the multiple streamCoRelatedEvents calls). The main motivation for B07 was to compute the sets of correlated conditions which corresponds a specific place in an efficient way to use it in the "evolve candidate" method. Since [4336eb1](https://github.com/ultimate-pa/ultimate/commit/4336eb154623c8d73602d699b6c2e7f319a1a0ba) we check correlated conditions in the "evolve candidate" method only pairwise and don't need B07 anymore.

I agree. But I want to add that we do not need B07 because we do an on-demand construction of successor transitions. If we would know all predecessor places of an outgoing transition in advance, it might be useful to compute only co-related conditions that have a certain place.

Heizmann commented 4 years ago
Heizmann commented 4 years ago

If we only use the optimizations that are yet enabled by default, the runtime of DifferencePetriNwaBenchmark on the materialistic benchmark is 5595s. CsvAutomataOperationStatistics1461347900_2020-02-18_01-58-39-527.csv.txt java.hprof.txt

Heizmann commented 4 years ago

The optimization mentioned in B32 reduced the runtime on the materialistic benchmark set from 5595s to 512s. CsvAutomataOperationStatistics1461347900_2020-02-18_01-58-39-527.csv.txt java.hprof.txt

Heizmann commented 4 years ago

If we additionally use the optimization of 81c6a12 the runtime is reduced to 451s. CsvAutomataOperationStatistics1461347900_2020-02-18_04-01-33-606.csv.txt java.hprof.txt

Heizmann commented 4 years ago

In 3b27a0c I added a new testsuite for evaluating our optimizations directly within the program verification. Ideas:

Rules:

Warnings:

Heizmann commented 4 years ago

At moment ( 3b27a0c ) I the runtime is 496s. java.hprof.txt IncrementalLogWithBenchmarkResults_2020-02-25_02-41-33-839-incremental.log

Heizmann commented 4 years ago
Heizmann commented 4 years ago

Same experiment as two comments before but this time with the new definition of size. Runtime was 497s. java.hprof.txt IncrementalLogWithBenchmarkResults_2020-02-25_03-50-47-316-incremental.log

Heizmann commented 4 years ago

Same experiment after 90f2a0cc5930582937fffd6e33bfb32ad7a64a42 and with mRemoveRedundantFlow set to true. Runtime was 558s. java.hprof.txt IncrementalLogWithBenchmarkResults_2020-02-26_02-31-25-428-incremental.log

Heizmann commented 4 years ago

Same experiment with backfolding (and a bunch of immature local changes that I needed for the integration) Runtime was 535. java.hprof.txt IncrementalLogWithBenchmarkResults_2020-02-26_04-53-50-235-incremental.log Interleaved Log: 1. default, 2. with RemoveRedundantFlow, 3. with Backfolding and RemoveRedundantFlow We see that the combination of Backfolding and RemoveRedundantFlow reduces the time that is needed for the difference operation. Next, I will improve the integration (e.g., reuse existing unfoldings) in order to improve the runtime of the additional operations.

Heizmann commented 4 years ago

b8661f94447577f9571590a76d767fcbe40cdfae reduced the runtime on the materialistic benchmark set to 369s (from 512s) java.hprof.txt CsvAutomataOperationStatistics1461347900_2020-02-29_16-23-21-647.csv.txt

.

Heizmann commented 4 years ago

b8661f9 reduced the runtime of the Svcomp20AutomizerConcurrentSpeedBenchmarks from 497 to 470s (given the fact that 3*120s=360s are spend on timeouts, this is a significant improvement.) java.hprof.txt IncrementalLogWithBenchmarkResults_2020-03-01_00-36-53-290-incremental.log

Heizmann commented 4 years ago

Some commit(s) of the last two weeks (probably the refactoring of conditions reduced the runtime on the materialistic benchmark set to 284s. java.hprof.txt CsvAutomataOperationStatistics1772471998_2020-03-14_04-40-04-036.csv.txt