richelbilderbeek / pbdmms

Some models
GNU General Public License v3.0
2 stars 0 forks source link

sado: first round of profiling #251

Closed richelbilderbeek closed 7 years ago

richelbilderbeek commented 7 years ago

@Lumphie : I see that sado only runs the simulation. Could you also let it create all results? This to see where all time is spent.

Lumphie commented 7 years ago

It should do that now?

richelbilderbeek commented 7 years ago

Would be great!

Lumphie commented 7 years ago

Otherwise I don't know what you're talking about 😅

richelbilderbeek commented 7 years ago

OK, I'll fix the tests :+1:

richelbilderbeek commented 7 years ago

Good progress! Do not forget the maximum likelihood estimation :-)

richelbilderbeek commented 7 years ago

First Travis profile result:

real    4m5.077s
user    4m4.592s
sys 0m0.168s
$ if [[ $TRAVIS_BRANCH == "sado" ]]; then head gprof.log -n 1000; fi
Flat profile:
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 25.29     11.10    11.10                             sado::create_graph_from_species_vector(std::vector<sado::species, std::allocator<sado::species> > const&)
 25.06     22.10    11.00 9926367312     0.00     0.00  sado::species::operator[](int) const
 16.78     29.47     7.37 1204583894     0.00     0.00  sado::gausser_impl_lut::operator()(double) const
  9.13     33.47     4.01 4963285956     0.00     0.00  sado::operator==(sado::id, sado::id)
  8.13     37.04     3.57 2483753994     0.00     0.00  sado::species::size() const
  3.42     38.54     1.50   112005     0.01     0.05  sado::get_attractivenesses(sado::population const&, double, double, sado::parameters const&)
  2.32     39.56     1.02    43014     0.02     0.02  std::vector<sado::indiv, std::allocator<sado::indiv> >::operator=(std::vector<sado::indiv, std::allocator<sado::indiv> > const&)
  2.18     40.52     0.96                             sado::operator<<(std::ostream&, sado::id)
  2.12     41.45     0.93   240827     0.00     0.02  sado::calc_comp(sado::population const&, double, sado::parameters const&)

create_graph_from_species_vector takes up 25% of the total time now. I wonder how this scales.

richelbilderbeek commented 7 years ago

By refactoring create_graph_from_species_vector I've already added multiple improvements, like early returning and iteration instead of indexing.

richelbilderbeek commented 7 years ago

Nice improvement, build 2409:

real    1m12.539s
user    1m12.012s
sys 0m0.188s
Flat profile:
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 48.84     12.67    12.67      462    27.42    33.47  sado::has_ancestor_and_kid(sado::species const&, sado::species const&)
 21.65     18.29     5.62 1204583894     0.00     0.00  sado::gausser_impl_lut::operator()(double) const
 10.78     21.08     2.80 4434158628     0.00     0.00  sado::operator==(sado::id, sado::id)
  4.70     22.30     1.22   112005     0.01     0.04  sado::get_attractivenesses(sado::population const&, double, double, sado::parameters const&)
  4.01     23.34     1.04   240827     0.00     0.02  sado::calc_comp(sado::population const&, double, sado::parameters const&)
  2.39     23.96     0.62    63964     0.01     0.01  sado::get_summed(std::vector<double, std::allocator<double> > const&)
  1.95     24.47     0.51        2   252.50   252.50  sado::operator<<(std::ostream&, sado::id)

Actually, it is so nice, I find it hard to believe.

richelbilderbeek commented 7 years ago

Build 2410 show it is consistent:

real    1m8.342s
user    1m7.860s
sys 0m0.164s
Flat profile:
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 54.21     12.68    12.68      462    27.45    32.97  sado::has_ancestor_and_kid(sado::species const&, sado::species const&)
 17.10     16.68     4.00 1204583894     0.00     0.00  sado::gausser_impl_lut::operator()(double) const
 10.90     19.23     2.55 4434158628     0.00     0.00  sado::operator==(sado::id, sado::id)
  5.17     20.44     1.21   112005     0.01     0.03  sado::get_attractivenesses(sado::population const&, double, double, sado::parameters const&)
richelbilderbeek commented 7 years ago

Build 2412 too:

real    1m5.964s
user    1m5.444s
sys 0m0.156s
Flat profile:
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 53.15     12.81    12.81      462    27.73    33.54  sado::has_ancestor_and_kid(sado::species const&, sado::species const&)
 17.76     17.09     4.28 1204583894     0.00     0.00  sado::gausser_impl_lut::operator()(double) const
 11.14     19.78     2.69 4434158628     0.00     0.00  sado::operator==(sado::id, sado::id)
  4.81     20.94     1.16   112005     0.01     0.03  sado::get_attractivenesses(sado::population const&, double, double, sado::parameters const&)
  4.44     22.01     1.07   240827     0.00     0.01  sado::calc_comp(sado::population const&, double, sado::parameters const&)
richelbilderbeek commented 7 years ago

An improvement in 2414 did pay off:

real    0m41.624s
user    0m40.940s
sys 0m0.336s
Flat profile:
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 37.49      6.35     6.35      232    27.37    33.66  sado::has_ancestor_and_kid(sado::species const&, sado::species const&)
 25.09     10.60     4.25 1204583894     0.00     0.00  sado::gausser_impl_lut::operator()(double) const
  9.21     12.16     1.56   112005     0.01     0.03  sado::get_attractivenesses(sado::population const&, double, double, sado::parameters const&)
  8.62     13.62     1.46 2084923752     0.00     0.00  sado::operator==(sado::id, sado::id)
  7.91     14.96     1.34   240827     0.01     0.01  sado::calc_comp(sado::population const&, double, sado::parameters const&)
richelbilderbeek commented 7 years ago

First big-O run:

0_0:
real    0m47.466s
user    0m46.296s
sys 0m0.160s

1_0:
real    0m37.262s
user    0m35.912s
sys 0m0.128s

2_0:
real    0m27.362s
user    0m26.436s
sys 0m0.108s

3_0:
real    0m22.742s
user    0m19.992s
sys 0m0.088s

4_0:
real    0m17.932s
user    0m13.796s
sys 0m0.084s

5_0:
real    0m11.019s
user    0m8.872s
sys 0m0.072s

6_0:
real    0m5.976s
user    0m5.124s
sys 0m0.072s

7_0:
real    0m4.350s
user    0m2.920s
sys 0m0.044s

8_0:
real    0m2.350s
user    0m1.748s
sys 0m0.048s

9_0:
real    0m1.405s
user    0m0.836s
sys 0m0.040s

0_0:
real    0m46.901s
user    0m45.664s
sys 0m0.184s

0_1:
real    0m37.300s
user    0m36.136s
sys 0m0.176s

0_2:
real    0m24.917s
user    0m23.876s
sys 0m0.160s

0_3:
real    0m20.033s
user    0m19.040s
sys 0m0.104s

0_4:
real    0m13.825s
user    0m12.788s
sys 0m0.132s

0_5:
real    0m10.869s
user    0m9.880s
sys 0m0.104s

0_6:
real    0m10.135s
user    0m8.964s
sys 0m0.100s

0_7:
real    0m11.648s
user    0m9.180s
sys 0m0.092s

0_8:
real    0m8.641s
user    0m7.636s
sys 0m0.096s

0_9:
real    0m6.838s
user    0m5.880s
sys 0m0.080s
richelbilderbeek commented 7 years ago
filename t_end c real (sec)
0_0.txt 100 0.0005 48
1_0.txt 90 0.0005 37
2_0.txt 80 0.0005 27
3_0.txt 70 0.0005 23
4_0.txt 60 0.0005 18
5_0.txt 50 0.0005 11
6_0.txt 40 0.0005 6
7_0.txt 30 0.0005 4
8_0.txt 20 0.0005 2
9_0.txt 10 0.0005 1
--- --- --- ---
0_0.txt 100 0.0005 47
0_1.txt 100 0.0006 37
0_2.txt 100 0.0007 25
0_3.txt 100 0.0008 20
0_4.txt 100 0.0009 13
0_5.txt 100 0.001 11
0_6.txt 100 0.0011 10
0_7.txt 100 0.0012 12
0_8.txt 100 0.0013 8
0_9.txt 100 0.0014 7
richelbilderbeek commented 7 years ago
real    0m52.775s
user    0m46.304s
sys 0m0.228s

real    0m38.376s
user    0m35.484s
sys 0m0.156s

real    0m27.544s
user    0m26.536s
sys 0m0.068s

real    0m20.188s
user    0m19.244s
sys 0m0.136s

real    0m14.099s
user    0m13.284s
sys 0m0.104s

real    0m9.834s
user    0m8.968s
sys 0m0.108s

real    0m5.935s
user    0m5.124s
sys 0m0.056s

real    0m3.637s
user    0m2.916s
sys 0m0.068s

real    0m2.327s
user    0m1.708s
sys 0m0.056s

real    0m1.441s
user    0m0.816s
sys 0m0.068s

real    0m46.194s
user    0m45.072s
sys 0m0.152s

real    0m41.304s
user    0m35.696s
sys 0m0.156s

real    0m24.949s
user    0m23.804s
sys 0m0.172s

real    0m20.066s
user    0m19.044s
sys 0m0.116s

real    0m13.943s
user    0m12.836s
sys 0m0.116s

real    0m11.862s
user    0m10.776s
sys 0m0.100s

real    0m9.844s
user    0m8.832s
sys 0m0.072s

real    0m10.283s
user    0m9.168s
sys 0m0.084s

real    0m8.514s
user    0m7.480s
sys 0m0.088s

real    0m6.847s
user    0m5.828s
sys 0m0.088s
richelbilderbeek commented 7 years ago

OK, culprit is has_ancestor_and_kid

c t_end (generations) t(sec) has_ancestor_and_kid (%)
0.005 100 50 40
0.005 200 280 50
0.005 400 1378 59
p230198@fwn-biol-132-102:~/GitHubs/pbdmms/sado_profiling$ head gprof.log 
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 59.22    177.83   177.83     3322     0.05     0.07  sado::has_ancestor_and_kid(sado::species const&, sado::species const&)
 16.11    226.22    48.38 8515907390     0.00     0.00  sado::operator==(sado::id, sado::id)
  6.57    245.93    19.72 6894421106     0.00     0.00  sado::gausser_impl_lut::operator()(double) const
  3.79    257.30    11.37       84     0.14     0.14  sado::operator<<(std::ostream&, sado::id)
  3.26    267.09     9.79   115552     0.00     0.00  std::vector<sado::indiv, std::allocator<sado::indiv> >::operator=(std::vector<sado::indiv, std::allocator<sado::indiv> > const&)
richelbilderbeek commented 7 years ago

Solution: put those species in a hash table.

richelbilderbeek commented 7 years ago

I have put these in a std::set now, maybe do a hash table later.

richelbilderbeek commented 7 years ago

OK, new version is increasingly faster for bigger inputs:

c t_end (generations) t(sec) old t(sec) new speedup
0.005 100 50 39 1.3x
0.005 200 280 83 3.4x
0.005 400 1378 305 4.5x
0.005 800 6108 641 9.6x

speedup = old time / new time

New culprit: std::vector<sado::indiv>::operator=(std::vector<sado::indiv> const&):

p230198@fwn-biol-132-102:~/GitHubs/pbdmms/sado_profiling$ head gprof.log 
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 22.93     42.48    42.48   448592     0.00     0.00  std::vector<sado::indiv, std::allocator<sado::indiv> >::operator=(std::vector<sado::indiv, std::allocator<sado::indiv> > const&)
 21.53     82.37    39.89 4128078855     0.00     0.00  sado::gausser_impl_lut::operator()(double) const
  8.52     98.15    15.78     3832     0.00     0.00  void std::vector<sado::indiv, std::allocator<sado::indiv> >::_M_emplace_back_aux<sado::indiv const&>(sado::indiv const&)
  8.37    113.66    15.51      801     0.02     0.10  sado::create_next_generation(sado::population const&, sado::parameters const&)
  8.30    129.03    15.37  1461780     0.00     0.00  sado::get_attractivenesses(sado::population const&, double, double, sado::parameters const&)
richelbilderbeek commented 7 years ago

Put new profiling on sado_profiling branch

richelbilderbeek commented 7 years ago

Replaced all push_backs by emplace_backs. Expect no difference.

richelbilderbeek commented 7 years ago

These are the times, there is no clear difference:

c t_end (generations) t(sec) old t(sec) new speedup
0.005 100 39 23 1.7x
0.005 200 83 83 1.0x
0.005 400 305 248 1.2x
0.005 800 641 685 0.9

Used:

time ./sado --profile end 100

And wrote down the wall clock time.

I am most interested in the profile of the last run.

richelbilderbeek commented 7 years ago

As expected, no difference:

p230198@fwn-biol-132-102:~/GitHubs/pbdmms/sado_big_o$ time ./sado --profile end 800
[...]
p230198@fwn-biol-132-102:~/GitHubs/pbdmms/sado_big_o$ gprof sado > gprof.log
p230198@fwn-biol-132-102:~/GitHubs/pbdmms/sado_big_o$ head gprof.log 
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 22.99     43.29    43.29   448592     0.00     0.00  std::vector<sado::indiv, std::allocator<sado::indiv> >::operator=(std::vector<sado::indiv, std::allocator<sado::indiv> > const&)
 20.93     82.72    39.43 4128078855     0.00     0.00  sado::gausser_impl_lut::operator()(double) const
  8.43     98.60    15.88      801     0.02     0.10  sado::create_next_generation(sado::population const&, sado::parameters const&)
  8.06    113.78    15.18     3832     0.00     0.00  void std::vector<sado::indiv, std::allocator<sado::indiv> >::_M_emplace_back_aux<sado::indiv const&>(sado::indiv const&)
  7.69    128.27    14.49  1461780     0.00     0.00  sado::get_attractivenesses(sado::population const&, double, double, sado::parameters const&)
richelbilderbeek commented 7 years ago

sado::simulation::do_timestep, replaced:

  const auto next_generation = create_next_generation(m_population, m_parameters);
  m_population = next_generation;

by

  auto next_generation = create_next_generation(m_population, m_parameters);
  std::swap(m_population, next_generation);

to save copies potentially. It should be visible in the profile. There were 448592 copies before this change.

richelbilderbeek commented 7 years ago

Branch failed, need to reed Meyers 2016. Will do so after TRES staff meeting when at home.

richelbilderbeek commented 7 years ago

Enough profiling for now. I'll notice when something needs to be improved.