jwaldmann / ceta-postproc

GNU Lesser General Public License v3.0
0 stars 0 forks source link

ceta needs a lot of time for checking *bounds #25

Open jwaldmann opened 2 years ago

jwaldmann commented 2 years ago

ICFP/3633 : ttt2 produces compatible automaton with 2000 states, ceta needs 25 min certification time on starexec.

jwaldmann commented 2 years ago

This is the top of the profile

        Thu Aug  4 15:18 2022 Time and Allocation Profiling Report  (Final)

           ceta-postproc +RTS -P -H -RTS /dev/stdin data/bound/3633.xml

        total time  =     2474.80 secs   (2474795 ticks @ 1000 us, 1 processor)
        total alloc = 1,770,234,125,896 bytes  (excludes profiling overheads)

COST CENTRE     MODULE    SRC                                                       %time %alloc  ticks     bytes

integer_of_ci   Ceta      CeTA-2.42/generated/Haskell/Ceta.hs:691:1-36               14.8    0.0  366746       432
integer_of_char Ceta      CeTA-2.42/generated/Haskell/Ceta.hs:694:1-48               12.2    0.0  302392         0
comparator_list Ceta      CeTA-2.42/generated/Haskell/Ceta.hs:(4238,1)-(4246,34)     10.5    0.0  260644        48
balance         Ceta      CeTA-2.42/generated/Haskell/Ceta.hs:(2241,1)-(2319,26)      9.3   64.4  230796 1139164301088
comparator_of   Ceta      CeTA-2.42/generated/Haskell/Ceta.hs:964:1-75                7.4    0.0  182817        32
less_char       Ceta      CeTA-2.42/generated/Haskell/Ceta.hs:5881:1-53               7.3    0.0  180981         0
rbt_ins         Ceta      CeTA-2.42/generated/Haskell/Ceta.hs:(10125,1)-(10135,46)    7.1   23.6  175869 418499570800
equal_char      Ceta      CeTA-2.42/generated/Haskell/Ceta.hs:5871:1-55               6.7    0.0  164627       176
compare         Ceta      CeTA-2.42/generated/Haskell/Ceta.hs:5907:3-24               5.4    0.0  134755        16
less            Ceta      CeTA-2.42/generated/Haskell/Ceta.hs:5885:3-18               4.6    0.0  114569         0
rm_iterateoi    Ceta      CeTA-2.42/generated/Haskell/Ceta.hs:(12705,1)-(12712,15)    3.8    8.6  93627 151723221568
compare_char    Ceta      CeTA-2.42/generated/Haskell/Ceta.hs:5904:1-28               3.4    0.0  83480         0
compare_list    Ceta      CeTA-2.42/generated/Haskell/Ceta.hs:4249:1-38               2.8    0.0  70102        64
==              Ceta      CeTA-2.42/generated/Haskell/Ceta.hs:5874:3-25               2.2    0.0  54978         0
paint           Ceta      CeTA-2.42/generated/Haskell/Ceta.hs:(2339,1)-(2340,46)      0.5    2.9  12080 50592874656
jwaldmann commented 2 years ago

A function balance does a lot of allocation. That's for building red-black trees, used to represent sets.

Then there's comparator_list that takes a lot of time.

Could it be that the elements of the set (the keys of the tree) are strings? Well, then it certainly gets expensive.

Two quick thoughts:

jwaldmann commented 2 years ago

most data is very short-lived. the collector says

1,831,682,981,848 bytes allocated in the heap
     394,478,400 bytes copied during GC
       7,305,240 bytes maximum residency (56 sample(s))

note: 2 TB allocated, max residency < 10 MB.

with some RTS options, I can get runtime down (from 11 s to 9 s) and GC time is 0.4 percent only

+RTS -M60g -A30g -G1

  Productivity  99.6% of total user, 99.6% of total elapsed
jwaldmann commented 2 years ago

I made a profile with hpc, the top is

hpc show ceta-postproc.tix |grep Top|sort -nr -k2

23195 46792471618 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta         694:1-694:48 TopLevelBox ["integer_of_char"]
23192 46792471618 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta         691:1-691:36 TopLevelBox ["integer_of_ci"]
23189 46792471618 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta         651:1-651:41 TopLevelBox ["rep_char_integer"]
 9846 23962369769 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta       4238:1-4246:34 TopLevelBox ["comparator_list"]
18667 23660888459 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta         964:1-964:75 TopLevelBox ["comparator_of"]
23598 23659132122 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta       5881:1-5881:53 TopLevelBox ["less_char"]
23464 23133102693 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta       5871:1-5871:55 TopLevelBox ["equal_char"]
101730 23132574996 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta       5874:3-5874:25 TopLevelBox ["=="]
12876 12901381636 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta     10125:1-10135:46 TopLevelBox ["rbt_ins"]
 9849 12016035216 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta       4249:1-4249:38 TopLevelBox ["compare_list"]
101578 12016035216 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta       4252:3-4252:24 TopLevelBox ["compare"]
12172 8016688211 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta       2241:1-2319:26 TopLevelBox ["balance"]
 3801 2115950552 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta     12705:1-12712:15 TopLevelBox ["rm_iterateoi"]
 6066 1124900877 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta     10117:1-10117:19 TopLevelBox ["impl_of"]
11573 1054018222 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta       2339:1-2340:46 TopLevelBox ["paint"]
12895 1054018210 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta     10146:1-10146:53 TopLevelBox ["insert"]
12888 1054018210 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta     10143:1-10143:49 TopLevelBox ["rbt_insert"]
12884 1054018210 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta     10140:1-10140:55 TopLevelBox ["rbt_insert_with_key"]
13080 1053972026 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta     20127:1-20127:36 TopLevelBox ["ins_rm_basic_ops"]
 6064  226547896 ceta-postproc-2.42.0-8CNp1Qy7y5f8ZoGXdmhNI2/Ceta     10149:1-10154:28 TopLevelBox ["rbt_lookup"]

I find the high number of integer_of_char suspicious. Certificate size is 242 kB, and each char should be processed only once?

But perhaps it's harmless, since

  690 integer_of_ci :: Char_integer -> Integer;
  691 integer_of_ci x = rep_char_integer x;
  692 
  693 integer_of_char :: Char -> Integer;
  694 integer_of_char (Char_of_ci x) = integer_of_ci x;

  648 newtype Char_integer = Abs_char_integer Integer;
  649 
  650 rep_char_integer :: Char_integer -> Integer;
  651 rep_char_integer (Abs_char_integer x) = x;

Hm. This should be inlined away? Perhaps it is, and it's made visible only because of instrumentation for profiling. (Could be improved by -fprof-late https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7797 coming with ghc-9.4)

Still, the above question remains (red-black-tree with lists as keys)

jwaldmann commented 2 years ago

program compiled with

cabal build --with-ghc=/opt/ghc/ghc-9.4.0.20220721/bin/ghc --enable-profiling -O2 --ghc-options=-fprof-late -v

gives this profile:

           ceta-postproc +RTS -p -M60g -A30g -G1 -RTS /dev/stdin data/bound/3633.xml

        total time  =      800.28 secs   (800284 ticks @ 1000 us, 1 processor)
        total alloc = 1,735,038,859,928 bytes  (excludes profiling overheads)

COST CENTRE                     MODULE SRC                                             %time %alloc

$fCcompare(,)_$scomparator_list Ceta   CeTA-2.42/generated/Haskell/Ceta.hs:4238:1-15    29.3    0.0
balance                         Ceta   CeTA-2.42/generated/Haskell/Ceta.hs:2241:1-7     29.3   65.7
$s$srbt_ins12                   Ceta   CeTA-2.42/generated/Haskell/Ceta.hs:10125:1-7    28.7   22.3
rm_iterateoi                    Ceta   CeTA-2.42/generated/Haskell/Ceta.hs:12705:1-12    6.6    7.3
lvl3943                         Ceta   <no location info>                                5.1    4.3

conclusion: integer_of_char is inlined away, red-black balancing and list-compare is prominent.

jwaldmann commented 2 years ago

without changing the code: have GHC aggresively transform the code with

cabal install --with-ghc=/opt/ghc/ghc-9.4.0.20220721/bin/ghc  -O2 --ghc-options="-fspecialize-aggressively -fexpose-all-unfoldings -O2"

then I am getting

lzip -dc data/bound/3633.cpf.ttt2.lz| time /home/waldmann/.cabal/bin/ceta-postproc  /dev/stdin data/bound/3633.xml +RTS -M60g -A30g -G1
starexec-result="YES"
certification-result="CERTIFIED"
certification-time="442.7"
output-size="242320"

With plain O2 for the compiler, and the same RTS options, the executable built with ghc-9.2.4 gets

certification-time="523.6"

With same (aggressive) compiler and RTS options, executable built with ghc-8.10.7 gets

certification-time="431.1"
jwaldmann commented 2 years ago

debugging with ghci, set breakpoint at function check_ta_bounded, load a certificate, then :force ta, will print

[CeTA-2.42/generated/Haskell/Ceta.hs:(36670,3)-(36686,13)] ghci> :force ta
ta = Ceta.TA_Impl
       (Ceta.RBT (Ceta.Branch
                    Ceta.B
                    (Ceta.Branch
                       Ceta.R
                       (Ceta.Branch
                          Ceta.B (Ceta.Branch Ceta.R Ceta.Empty [49] () Ceta.Empty) [50] ()
                          Ceta.Empty)
                       [51] () (Ceta.Branch Ceta.B Ceta.Empty [52] () Ceta.Empty))
                    [53] () (Ceta.Branch Ceta.B Ceta.Empty [54] () Ceta.Empty)))
       (Ceta.RBT (Ceta.Branch
                    Ceta.B
                    (Ceta.Branch
                       Ceta.B
                       (Ceta.Branch
                          Ceta.B
                          (Ceta.Branch
                             Ceta.B Ceta.Empty ((Ceta.UnLab [48],Ceta.Nat 0),Ceta.Nat 1)
                             [Ceta.TA_rule_impl
                                (Ceta.UnLab [48],Ceta.Nat 0) [[49]] [49]
                                (Ceta.RBT (Ceta.Branch
                                             Ceta.B
                                             (Ceta.Branch
                                                Ceta.B
                                                (Ceta.Branch
                                                   Ceta.B Ceta.Empty [49] ()
                                                   (Ceta.Branch
                                                      Ceta.R Ceta.Empty [49,50,52] () Ceta.Empty))
...
        ([49,56,49,57],[56,55,54]),([50,48,48,49],[51,51,48]),
        ([49,48,50,52],[50,51,50,57]),([49,48,50,52],[50,52,50,55]),
        ([49,55,51,49],[56,51,53]),([56,56,51],[56,55,52]),
        ([49,57,54],[49,56,53]),([49,57,54],[54,50,55]),
        ([50,50,50,55],[55,52,55]),([49,49,49,53],[50,53,55]),
        ([52,51,53],[52,50,53]),([50,52,54,57],[55,54,49])]
       _ _

and this looks like the states of the automaton are represented as list-of-char, e.g.. [49,50,52] (means 124)

Oh yes indeed, state=string, http://cl2-informatik.uibk.ac.at/rewriting/mercurial.cgi/IsaFoR/file/c41d7d152c37/xml/cpf.xsd#l1420

   1420   <xs:element name="state" type="xs:string">
   1421     <xs:annotation>
   1422       <xs:documentation>the state of a tree automation</xs:documentation>
   1423     </xs:annotation>
   1424   </xs:element>
   1425   <xs:element name="treeAutomaton">
   1426     <xs:complexType>
   1427       <xs:sequence>
   1428         <xs:element name="finalStates">
   1429           <xs:complexType>
   1430             <xs:sequence>
   1431               <xs:element maxOccurs="unbounded" minOccurs="0" ref="state"/>
   1432             </xs:sequence>
   1433           </xs:complexType>
   1434         </xs:element>
...

Well this makes it expensive.

jwaldmann commented 2 years ago

I implemented CPF output of full matchbounds for matchbox (I mean, it's already in the name, so ..). Encoding of state to string is configurable. The following

helps to reduce run-time of ceta by 25 percent.

    ; mb = Apply (Worker (Matchbound { method = Full
       , cpf_state_encoding = Encoding {
       -- test cases (timing given after config line)
       --   Zantema_04/z097 (full mb 4 with 885 states)
       -- / Secret_05_SRS/jambox3 (full mb 8 with 1843 states)
       -- base = 10 , order = MSB_First -- 7.7 seconds / 310 s
       --   base = 10 , order = LSB_First -- 6.4 seconds
       -- base = 64 , order = MSB_First --  7.0 seconds
       base = 64 , order = LSB_First -- 5.6 seconds / 230 s
      -- for fun:
      --   base = 2 , order = MSB_First -- 11 s
      --  base = 2 , order = LSB_First -- 10 s
       }
      } )) done
jwaldmann commented 2 years ago

With ceta-states-integer https://lists.rwth-aachen.de/hyperkitty/list/termtools@lists.rwth-aachen.de/message/Y65SMRMZLX42LIUYRZRX3TGYT73X64YY/

I am getting

starexec-result="YES"
certification-result="CERTIFIED"
certification-time="374.3"
output-size="242320"

it's an improvement (15 percent). The profile is

           ceta-postproc-integer-new +RTS -p -M60g -A30g -G1 -Sstderr -RTS data/bound/3633.cpf.ttt2~ data/bound/3633.xml

        total time  =      565.60 secs   (565595 ticks @ 1000 us, 1 processor)
        total alloc = 1,859,816,356,672 bytes  (excludes profiling overheads)

COST CENTRE                 MODULE SRC                                         %time %alloc

balance                     Ceta   ceta-states-integer/new/Ceta.hs:2209:1-7     41.7   66.0
certify_proof_$s$srbt_ins12 Ceta   ceta-states-integer/new/Ceta.hs:10180:1-7    40.8   22.2
rm_iterateoi                Ceta   ceta-states-integer/new/Ceta.hs:12743:1-12    9.4    7.1
certify_proof1654           Ceta   <no location info>                            6.5    4.3

good - comparator_list is gone. Now RB operations dominate everything.

jwaldmann commented 2 years ago

To see the call chains, I made a profile without optimisation,(else, functions might get inlined away). It seems that this is expensive:

                 check_bounds_generic.\.\.\.\.\.\.\.\                                   Ceta                    ceta-states-integer/new/Ceta.hs:(37199,12)-(37237,27) 4964           0    0.0    0.0    94.0   94.1
                  state_compatible_eff_list                                             Ceta                    ceta-states-integer/new/Ceta.hs:(36654,1)-(36659,29)  4965           0    0.0    0.0    94.0   94.1
                   state_compatible_eff_list.\                                          Ceta                    ceta-states-integer/new/Ceta.hs:36658:28-71           5228        1027    0.0    0.0     0.0    0.0
                    catch_errora                                                        Ceta                    ceta-states-integer/new/Ceta.hs:(13800,1)-(13803,21)  5229        1027    0.0    0.0     0.0    0.0
                   state_compatible_eff_list.check                                      Ceta                    ceta-states-integer/new/Ceta.hs:36656:5-49            5230           0    0.0    0.0    94.0   94.1
                    rule_state_compatible_eff_list                                      Ceta                    ceta-states-integer/new/Ceta.hs:(36623,1)-(36645,32)  5231           0    0.0    0.0    94.0   94.1
                     rule_state_compatible_eff_list.\                                   Ceta                    ceta-states-integer/new/Ceta.hs:(36632,10)-(36645,31) 5232        1027    0.0    0.0    92.4   93.1

...

                        ta_match_impl.ep                                                Ceta                    ceta-states-integer/new/Ceta.hs:36571:5-29            5306      118530    0.0    0.0    91.5   92.5
                         rs_Union                                                       Ceta                    ceta-states-integer/new/Ceta.hs:36469:1-77            5307      118530    0.0    0.0    91.5   92.5
                          foldl                                                         Ceta                    ceta-states-integer/new/Ceta.hs:(3150,1)-(3151,39)    5308     1516702    0.0    0.0    91.5   92.5
                           g_union_dflt_basic_oops_rm_basic_ops                         Ceta                    ceta-states-integer/new/Ceta.hs:(36464,1)-(36466,23)  5309     1398172    0.0    0.0    91.5   92.5
                            iteratei_bset_op_list_it_dflt_basic_oops_rm_basic_ops       Ceta                    ceta-states-integer/new/Ceta.hs:(28195,1)-(28196,49)  5310     1398172    0.0    0.0    91.5   92.5
                             iteratei_bset_op_list_it_dflt_basic_oops_rm_basic_ops.\    Ceta                    ceta-states-integer/new/Ceta.hs:28196:13-48           5311     1398172    0.0    0.0    91.5   92.5
                              rm_iterateoi                                              Ceta                    ceta-states-integer/new/Ceta.hs:(12743,1)-(12750,15)  5312  2032465262    8.1    5.6    91.5   92.5
                               g_union_dflt_basic_oops_rm_basic_ops.\                   Ceta                    ceta-states-integer/new/Ceta.hs:36465:68-71           5322  2031067090    0.0    0.0     0.0    0.0
                               ins_rm_basic_ops                                         Ceta                    ceta-states-integer/new/Ceta.hs:20289:1-36            5323  1015533545    0.3    0.0    82.7   86.9
                                insert                                                  Ceta                    ceta-states-integer/new/Ceta.hs:10201:1-53            5324  1015533545    1.1    0.7    82.5   86.9
                                 impl_of                                                Ceta                    ceta-states-integer/new/Ceta.hs:10172:1-19            5329  1015533545    0.0    0.0     0.0    0.0
                                 rbt_insert                                             Ceta                    ceta-states-integer/new/Ceta.hs:10198:1-49            5325  1015533545    0.6    1.2    81.3   86.2
                                  rbt_insert_with_key                                   Ceta                    ceta-states-integer/new/Ceta.hs:10195:1-55            5326  1015533545    1.4    2.4    80.7   84.9
                                   rbt_ins                                              Ceta                    ceta-states-integer/new/Ceta.hs:(10180,1)-(10190,46)  5328 13031651269   20.7   36.0    78.5   81.0
                                    balance                                             Ceta                    ceta-states-integer/new/Ceta.hs:(2209,1)-(2287,26)    5337  8146796632   15.7   45.0    15.7   45.0
                                    compare                                             Ceta                    ceta-states-integer/new/Ceta.hs:7266:3-27             5332           0    6.9    0.0    42.2    0.0
                                     compare_integer                                    Ceta                    ceta-states-integer/new/Ceta.hs:7263:1-31             5333           0    2.6    0.0    35.2    0.0
                                      comparator_of                                     Ceta                    ceta-states-integer/new/Ceta.hs:932:1-75              5334 12018216912   19.1    0.0    32.6    0.0
                                       less                                             Ceta                    ceta-states-integer/new/Ceta.hs:634:3-25              5335           0    2.5    0.0    13.5    0.0
                                        less.\                                          Ceta                    ceta-states-integer/new/Ceta.hs:634:20-24             5336 12018216912   11.0    0.0    11.0    0.0
                                   paint                                                Ceta                    ceta-states-integer/new/Ceta.hs:(2307,1)-(2308,46)    5327  1015533545    0.4    1.5     0.4    1.5
                                   compare                                              Ceta                    ceta-states-integer/new/Ceta.hs:7266:3-27             5331           0    0.4    0.0     0.4    0.0
                               rm_iterateoi.sigmaa                                      Ceta                    ceta-states-integer/new/Ceta.hs:12747:12-44           5330  1015533545    0.7    0.0     0.7    0.0

The code seems to be http://cl2-informatik.uibk.ac.at/rewriting/mercurial.cgi/IsaFoR/file/f77808a4258d/thys/Tree_Automata/Tree_Automata_Impl.thy#l367

fun ta_match_impl :: "('f \<times> nat,('q :: linorder,'f :: linorder)ta_rule_impl list)rm \<Rightarrow> 'q rs \<Rightarrow> ('q \<Rightarrow> 'q rs) \<Rightarrow> ('f,'v)term \<Rightarrow> 'q list \<Rightarrow> ('v :: linorder \<times> 'q)list rs"
  where "ta_match_impl TA Qsig eps (Var x) Q = rs.from_list (map (\<lambda>q'. [(x,q')]) (rs.to_list (rs.inter (rs_Union (map eps Q)) Qsig)))" 
   | "ta_match_impl TA Qsig eps (Fun f ts) Q = (let 
          n = length ts;
          rules = rm_set_lookup TA (f, n);
          ep = rs_Union (map eps Q);
          f = (\<lambda> rule. rs.from_list (case rule of TA_rule_impl _ qs q' qs' \<Rightarrow>
                 (if rs.memb q' ep then 
                   (let rec = map (\<lambda> (tsi,qsi). rs.to_list (ta_match_impl TA Qsig eps tsi [qsi])) (zip ts qs)
                     in map concat (concat_lists rec))
                 else [])))
        in rs_Union (map f rules))" 

in Haskell:

ta_match_impl ::
  forall a b c.
    (Compare_order a, Compare_order b, Eq b, Compare_order c,
      Eq c) => Rbt (a, Nat) [Ta_rule_impl b a] ->
                 Rbt b () ->
                   (b -> Rbt b ()) -> Term a c -> [b] -> Rbt [(c, b)] ();
ta_match_impl ta qsig eps (Var x) q =
  g_from_list_dflt_basic_oops_rm_basic_ops
    (map (\ qa -> [(x, qa)])
      (g_to_list_dflt_basic_oops_rm_basic_ops
        (g_inter_dflt_basic_oops_rm_basic_ops (rs_Union (map eps q)) qsig)));
ta_match_impl ta qsig eps (Fun f ts) q =
  let {
    n = size_list ts;
    rules = rm_set_lookup ta (f, n);
    ep = rs_Union (map eps q);
    fa = (\ rule ->
           g_from_list_dflt_basic_oops_rm_basic_ops
             (case rule of {
               TA_rule_impl _ qs qa _ ->
                 (if memb_rm_basic_ops qa ep
                   then let {
                          rec = map (\ (tsi, qsi) ->
                                      g_to_list_dflt_basic_oops_rm_basic_ops
(ta_match_impl ta qsig eps tsi [qsi]))
                                  (zip ts qs);
                        } in map concat (concat_lists rec)
                   else []);
             }));
  } in rs_Union (map fa rules);