sharplispers / ironclad

A cryptographic toolkit written in Common Lisp
BSD 3-Clause "New" or "Revised" License
166 stars 28 forks source link

Is there a reason (defmethod prng-random-data (num-bytes (prng os-prng)..) does not close the /dev/urandom file? #26

Closed sabracrolleton closed 4 years ago

sabracrolleton commented 4 years ago

Stress testing a postmodern enhancement and ran out of file handles. I traced it back to (defmethod prng-random-data (num-bytes (prng os-prng) ..) which opens /dev/urandom and never closes it. Is there a reason for this?

luismbo commented 4 years ago

Possibly it's faster to keep a handle rather than opening a new one each time prng-random-data is invoked. If that's true, then a close-prng and/or a finalizer are necessary. If that's not true, then we should close the file after reading in the data.

Can you benchmark and see if it makes any difference to keep a handle?

sabracrolleton commented 4 years ago

Seems to be a ccl problem because I am not seeing the file connections left open in sbcl or ecl. (All of this is on a linux box). Maybe the following will help the analysis.

I benchmarked the original code with trivial-benchmark:with-timing against a version of the original code wrapping the relevant code in a with-open-file macro. I.E:

  (let ((seq (make-array num-bytes :element-type '(unsigned-byte 8))))
    (unless (slot-boundp prng 'source)
      (with-open-file (stream #P"/dev/urandom" :element-type '(unsigned-byte 8))
        (setf (slot-value prng 'source)
              stream)
        (assert (>= (read-sequence seq (slot-value prng 'source)) num-bytes))))
    seq)

The test simply to run the following 1,000 times: authenticate against a postgresql database using scram-sha256, printing T if there was a valid connection to the database and re-attempting if the authentication was not valid. (I currently have a 0.3% failure rate on authentication, so the second attempts do show up occasionally in a 1,000 attempt run and those do not always get closed even in ecl or sbcl using the original code).

The results were as follows:

CCL original code

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  1000     272.13715  0.268867  0.541924  0.271101  0.272137  0.014937
RUN-TIME   1000     271.09796  0.267694  0.540449  0.270145  0.271098  0.014806

Resulting in 1010 open files of /dev/urandom. 20 calls later I ran out of file descriptors and could no longer access /dev/urandom without restarting ccl.

with-open-file version

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  1000     270.21777  0.265639  0.538448  0.26821   0.270218  0.022414
RUN-TIME   1000     269.21286  0.26474   0.537334  0.267214  0.269213  0.02238

Resulting in 1 open file of /dev/urandom

ECL original code

-             SAMPLES  TOTAL        MINIMUM   MAXIMUM   MEDIAN    AVERAGE     DEVIATION  
REAL-TIME     1000     366.37       0.349     0.387     0.37      0.36637     0.009237   
RUN-TIME      1000     385.273      0.343     0.415     0.401     0.385273    0.025091   
BYTES-CONSED  1000     19207220512  18765728  20047968  18785344  19207220.0  570324.5 

No connections left open to /dev/urandom

with-open-file version

-             SAMPLES  TOTAL        MINIMUM   MAXIMUM   MEDIAN    AVERAGE     DEVIATION  
REAL-TIME     1000     367.398      0.349     0.74      0.37      0.367398    0.018773   
RUN-TIME      1000     386.502      0.344     0.801     0.401     0.386502    0.030666   
BYTES-CONSED  1000     19249101520  18769952  38761840  18788048  19249102.0  1042669.8  

No connections left open to /dev/urandom

SBCL original code

-                SAMPLES  TOTAL       MINIMUM   MAXIMUM   MEDIAN    AVERAGE    DEVIATION
REAL-TIME        1000     15.731      0.014     0.031     0.015     0.015731   0.001356
RUN-TIME         1000     13.13       0.012     0.027     0.013     0.01313    0.00132
USER-RUN-TIME    1000     12.702619   0.009133  0.026748  0.012649  0.012703   0.001579
SYSTEM-RUN-TIME  1000     0.430588    0         0.006567  0         0.000431   0.001092
PAGE-FAULTS      1000     0           0         0         0         0          0.0
GC-RUN-TIME      1000     0.155       0         0.005     0         0.000155   0.000764
BYTES-CONSED     1000     7076875104  7048880   14118336  7059168   7076875.0  315388.22
EVAL-CALLS       1000     0           0         0         0         0          0.0

With four open files to /dev/urandom

with-open-file version

-                SAMPLES  TOTAL       MINIMUM   MAXIMUM   MEDIAN    AVERAGE    DEVIATION
REAL-TIME        1000     15.789      0.014     0.035     0.015     0.015789   0.001696
RUN-TIME         1000     13.166      0.012     0.031     0.013     0.013166   0.001623
USER-RUN-TIME    1000     12.754451   0.007267  0.026978  0.012648  0.012754   0.001631
SYSTEM-RUN-TIME  1000     0.417296    0         0.009881  0         0.000417   0.001118
PAGE-FAULTS      1000     0           0         0         0         0          0.0
GC-RUN-TIME      1000     0.155       0         0.006     0         0.000155   0.00077
BYTES-CONSED     1000     7097963392  7052688   14118656  7059184   7097963.5  497695.63
EVAL-CALLS       1000     0           0         0         0         0          0.0

with no open files to /dev/urandom

glv2 commented 4 years ago

When running the following test:

(let ((crypto:*prng* (crypto:make-prng :os)))
  (time (dotimes (i 1000)
          (crypto:random-data 16))))

I get the following results:

      current version   with-open-file version
SBCL     0.000175          0.023220
CCL      0.000321          0.033609

Opening and closing the file every time is at least 100 times slower.

luismbo commented 4 years ago

@sabracrolleton prng-random-data will open a new file handle once for each object you create with make-prng. I assumed you were opening multiple prng objects in your test. @glv2 seems like a close-prng function would make sense then.

sabracrolleton commented 4 years ago

By the way, my with-open-file sample code above does not return random data. So useless for real purpose intended.

@luismbo . Yes, I was opening multiple prng objects in my actual code because I was triggering private thread errors otherwise. Will look back at my benchmark code to make sure that was the case. But I admit that my benchmark code was had too much other stuff going on in the authentication, so the actual time for the random-data method was probably a rounding error. As a result, not a useful answer to your request.

sabracrolleton commented 4 years ago

Running corrected code that does generate random data and just the random data function alone, I duplicate glv2's results.

I still need to be able to close the prng if someone is using ccl when they run low on file handles.

glv2 commented 4 years ago

I added a finalizer for the os-prng class to close /dev/urandom (commit 36f3685aca747bd61a16b2e051c9fc50c042580d).

@sabracrolleton could you check if it solves your problem?

sabracrolleton commented 4 years ago

@glv2 Yes. This seems to solve ccl's problem. Thank You!

luismbo commented 4 years ago

@glv2 Thanks for fixing the indentation in os-prng.lisp BTW. 😁