snoyberg / tar-conduit

Conduit based tar extraction mechanism
MIT License
8 stars 9 forks source link

Tar PR review #11

Closed chrisdone closed 6 years ago

chrisdone commented 6 years ago

As a continuation to be merged AFTER #10, this PR adds time and space tests for tar/untar.

For the most part space usage is constant. There is one interesting result in the case of untar:

Case                         Allocated         Max        Live  GCs  Check
untar 1 files                   59,936       9,464       9,464    0  OK
untar 10 files                 475,480      20,384      20,384    0  OK
untar 100 files              4,645,408     128,384     128,384    6  OK
untar 1000 files            46,355,544   1,208,384   1,208,384   66  OK
untar 10000 files          463,453,072  12,008,384  12,008,384  664  OK

Indicates non-constant max residency for the untaring of n files as you scale n.

I'll take a cursory look through the untar function to see if there's an obvious change to improve this. I imagine it's not a particularly important use-case for IOHK, so it's probably fine like this.

Time numbers:

benchmarking untar 1 files
time                 8.671 μs   (8.559 μs .. 8.800 μs)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 8.733 μs   (8.639 μs .. 8.830 μs)
std dev              314.3 ns   (266.9 ns .. 379.3 ns)
variance introduced by outliers: 45% (moderately inflated)

benchmarking untar 10 files
time                 192.2 μs   (188.4 μs .. 196.4 μs)
                     0.997 R²   (0.996 R² .. 0.998 R²)
mean                 197.7 μs   (195.1 μs .. 200.6 μs)
std dev              9.237 μs   (7.820 μs .. 11.03 μs)
variance introduced by outliers: 45% (moderately inflated)

benchmarking untar 100 files
time                 2.022 ms   (1.970 ms .. 2.073 ms)
                     0.996 R²   (0.994 R² .. 0.998 R²)
mean                 1.956 ms   (1.928 ms .. 1.989 ms)
std dev              102.2 μs   (84.67 μs .. 129.7 μs)
variance introduced by outliers: 37% (moderately inflated)

benchmarking untar 1000 files
time                 19.33 ms   (18.84 ms .. 19.88 ms)
                     0.997 R²   (0.994 R² .. 0.999 R²)
mean                 20.19 ms   (19.89 ms .. 20.51 ms)
std dev              699.0 μs   (488.8 μs .. 1.216 ms)

benchmarking untar 10000 files
time                 202.3 ms   (198.6 ms .. 206.5 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 199.7 ms   (195.4 ms .. 201.3 ms)
std dev              3.501 ms   (46.06 μs .. 4.429 ms)
variance introduced by outliers: 14% (moderately inflated)

Allocation report

$ profile-query -f time.prof --sort alloc
Name                  %alloc   %time
MAIN                  100.00  100.00
main                   99.90   99.80
fuse                   79.70   72.90
>>=                    53.90   54.50
>>=.\                  46.40   47.70
tar                    38.40   35.40
tarFileInfo            38.30   35.40
untar                  38.10   36.70
untarChunks.loop       32.90   30.60
untarChunks            32.90   31.10
packHeader             32.50   28.70
packHeaderNoChecksum   32.10   24.20
>>=.\.\                30.40   20.30
parseHeader            26.10   19.70
pure                   22.70   10.20
parseHeader.bsum       17.20    9.70
encodeShort            16.60   13.20
encodeOctal            14.70   11.60
runAndAnalyse          11.80   15.70
defaultMainWith        11.80   15.80
runMode                11.80   15.70
defaultMain            11.80   15.80
withConfig             11.80   15.70
runAndAnalyseOne       11.80   15.70
runAndAnalyse.\        11.80   15.70
for.go                 11.80   15.70
analyseSample          11.80   15.70
for                    11.80   15.70
analyseOne             11.80   15.70
regress                11.20   10.60
chrisdone commented 6 years ago

I didn't find an obvious reason for the linear allocation. I suspect that it's actually the allocation of conduit constructors. Content to leave this alone.

snoyberg commented 6 years ago

Looks like this PR failed Travis:

Test suite space: RUNNING...
Case                         Allocated        Max       Live  GCs  Check  
tar 1 files                     33,760      2,104      2,104    0  OK     
tar 10 files                   260,760      3,544      3,544    0  OK     
tar 100 files                2,519,488          0          0    3  OK     
tar 1000 files              25,127,488          0          0   32  OK     
tar 10000 files            251,207,496          0          0  329  OK     
tar file of 1 bytes             33,760      2,160      2,160    0  OK     
tar file of 10 bytes            33,760      2,160      2,160    0  OK     
tar file of 100 bytes           33,760      2,160      2,160    0  OK     
tar file of 1000 bytes          33,760      2,160      2,160    0  OK     
tar file of 10000 bytes         33,760          0          0    0  OK     
untar 1 files                   85,808     10,992     10,992    0  OK     
untar 10 files                 562,048     17,832     17,832    0  OK     
untar 100 files              5,338,936     84,072     84,072    7  OK     
untar 1000 files            53,118,672    746,472    746,472   79  OK     
untar 10000 files          530,912,200  7,370,472  7,370,472  793  OK     
untar file of 1 bytes           85,864     11,104     11,104    0  INVALID
untar file of 10 bytes          85,856     11,096     11,096    0  INVALID
untar file of 100 bytes         85,768     11,008     11,008    0  INVALID
untar file of 1000 bytes        85,376     10,616     10,616    0  INVALID
untar file of 10000 bytes       85,592     10,832     10,832    0  INVALID
Check problems:
  untar file of 1 bytes
    Exceeded maximum bytes or allocated bytes!
  untar file of 10 bytes
    Exceeded maximum bytes or allocated bytes!
  untar file of 100 bytes
    Exceeded maximum bytes or allocated bytes!
  untar file of 1000 bytes
    Exceeded maximum bytes or allocated bytes!
  untar file of 10000 bytes
    Exceeded maximum bytes or allocated bytes!
Test suite space: FAIL
Test suite logged to: dist/test/tar-conduit-0.2.0-space.log
1 of 2 test suites (1 of 2 test cases) passed.