vermiculus / magithub

**DEPRECATED - please use Forge instead!** -- Magit-based interfaces to GitHub
GNU General Public License v3.0
579 stars 62 forks source link

Speed up github datetime parsing #393

Closed ibizaman closed 5 years ago

ibizaman commented 5 years ago

Magit-status on a repo containing 300 opened PRs took a lot of time. Profiling showed the parse-iso8601-time-string was taking roughly 20%.

This solution uses a crude parsing and falls back to using parse-iso8601-time-string in case of error.

Solves #349 partially.

vermiculus commented 5 years ago

Is it possible to have a timezone attachment here? Would it be better to use a regular expression to ensure the match or does that negate the performance savings? (Surely it's slower than substring, but it's probably safer if we can't rely on an exact format.)

ibizaman commented 5 years ago

@vermiculus I didn't try the regexp version, let me try that.

ibizaman commented 5 years ago

About the timezone attachement:

From the docs, github indeed returns only string formatted as:

YYYY-MM-DDTHH:MM:SSZ

There is a possibility for timezones to be returned but if I read the link correctly, it only happens if we request for timezones, which we don't right now.

About the regexp version:

I tested various versions. TL; DR: the regexp version you were suggesting is the best.

I benchmarked them using:

(benchmark-run 100000 (function "2018-04-16T23:21:22Z"))
(benchmark-run-compiled 100000 (function "2018-04-16T23:21:22Z"))

I used 100000 here because it's big enough to exhibit the garbage collector behavior.

Version 1, original version in magithub that uses parse-iso8601-time-string.

not compiled:

total #gc gc time
18.47164 4 1.3725979999999822
18.414163 4 1.39270300000004
----------- ----- --------------------
mean mean
18.442902 1.3826505

compiled:

total #gc gc time
19.496181 4 1.4118449999999712
24.156316 5 2.8743560000000343
24.966477 4 2.4553220000000238
21.414707 4 1.742256999999995
22.453887 4 1.8485410000000684
22.051609 4 1.8671919999999886
21.572962 4 1.816054999999949
----------- ----- --------------------
mean mean
22.301734 2.002224

Version 2, first version of this PR:

(defun magithub--parse-time-string-orig (iso8601)
  (if-let ((year (magithub--parse-number (substring iso8601 0 4)))
           (month (magithub--parse-number (substring iso8601 5 7)))
           (day (magithub--parse-number (substring iso8601 8 10)))
           (hour (magithub--parse-number (substring iso8601 11 13)))
           (minute (magithub--parse-number (substring iso8601 14 16)))
           (second (magithub--parse-number (substring iso8601 17 19))))
      (encode-time second minute hour day month year t)
    (parse-iso8601-time-string (concat iso8601 "+00:00"))))

not compiled:

total #gc gc time
3.4864580000000003 5 2.6737169999999537
2.745374 5 2.0453229999999962
2.866167 5 2.18744700000002
2.912332 5 2.1949189999999135
2.894857 5 2.1941739999999754
2.9461239999999997 5 2.2192469999999958
2.883014 5 2.2037540000000604
-------------------- ----- --------------------
mean mean
2.9620466 2.2455116

Version 3, regexp version:

total #gc gc time
2.061523 5 2.23209700000001
1.9775240000000003 5 2.154948999999988
2.107822 5 2.281375999999966
2.310956 5 2.4957390000000146
2.1891730000000003 5 2.4971419999999966
2.069382 5 2.2126919999999473
2.087435 5 2.214131000000009
-------------------- ----- --------------------
mean mean
2.1148307 2.2983037
(defun magithub--parse-time-regexp (iso8601)
  (if (string-match "\\([0-9]+\\)-\\([0-9]+\\)-\\([0-9]+\\)T\\([0-9]+\\):\\([0-9]+\\):\\([0-9]+\\)" iso8601)
    (encode-time
     (string-to-number (match-string 6 iso8601))
     (string-to-number (match-string 5 iso8601))
     (string-to-number (match-string 4 iso8601))
     (string-to-number (match-string 3 iso8601))
     (string-to-number (match-string 2 iso8601))
     (string-to-number (match-string 1 iso8601))
     t)
    (parse-iso8601-time-string (concat iso8601 "+00:00"))))

not compiled:

total #gc gc time
0.666644 1 0.377949000000001
1.0365689999999999 2 0.7586220000000026
0.6933590000000001 1 0.4103150000000255
0.327639 2 0.9071849999999699
1.185268 2 0.8892600000000357
0.7514689999999999 1 0.45953600000001416
1.256488 2 0.9278919999999857
-------------------- ----- ---------------------
mean mean
0.845348 0.67582271

compiled:

total #gc gc time
1.965279 2 1.4817140000000109
1.1782430000000002 1 0.6804280000000063
1.210629 2 0.9303669999999897
1.217581 2 0.9142180000000053
0.7890510000000001 1 0.48532000000000153
1.195846 2 0.904327999999964
0.762548 1 0.4659159999999929
-------------------- ----- ---------------------
mean mean
1.1884539 0.83747014

Version 4, the original version with trying to remove gc run time, but sacrifying error detection:

(defun magithub--parse-time-crude (iso8601)
  (encode-time
   (string-to-number (substring iso8601 17 19))
   (string-to-number (substring iso8601 14 16))
   (string-to-number (substring iso8601 11 13))
   (string-to-number (substring iso8601 8 10))
   (string-to-number (substring iso8601 5 7))
   (string-to-number (substring iso8601 0 4))
   t))

not compiled:

total #gc gc time
0.22129400000000005 2 0.8386120000000119
1.052038 2 0.8739590000000135
0.635235 1 0.453913
0.15975899999999998 2 0.8982039999999643
1.0675439999999998 2 0.8969069999999988
0.621846 1 0.45320700000002034
1.104236 2 0.9370819999999753
--------------------- ----- ---------------------
mean mean
0.69456457 0.76455486

compiled:

total #gc gc time
1.0809209999999998 2 0.9065830000000119
0.6107370000000001 1 0.4454809999999725
1.073501 2 0.8935240000000135
0.920847 2 0.7637710000000197
0.581662 1 0.4249409999999898
1.153146 2 0.9552679999999896
0.641712 1 0.4729300000000194
-------------------- ----- --------------------
mean mean
0.86607514 0.69464257

This one beats the regexp version but there's no error detection so is not useful. I included it just to get an idea.

In conclusion, the regexp version is the winner.

vermiculus commented 5 years ago

I certainly did not expect that result! Thanks for the thorough investigation. It looks like tests are failing, though: magithub-test-parse-number. That leads me to a question, though – why is magithub--parse-number necessary? It looks like it just double-checks that the number we get from parsing is the number we'd get from formatting it again.

I might be missing some context, though, because it appears the function is no longer used as of the latest patch 😄

Ah, I think I answered my own question. You wanted to return nil specifically for if-let. Looks like it can be removed now, though :)

ibizaman commented 5 years ago

@vermiculus Indeed that was the reason :) Just remove the function, there's no need for it anymore.

vermiculus commented 5 years ago

Merged, thanks!!