ruby / zlib

Ruby interface for the zlib compression/decompression library
Other
49 stars 35 forks source link

Avoid allocating intermediary strings when readpartial is passed an outbuf #61

Closed segiddins closed 4 months ago

segiddins commented 12 months ago

This accounts for a significant number of string allocations when reading rubygems, but we can avoid that in many places by only copying into the outbuf when present

This is my first time in a while writing C code for Ruby, so please excuse any style errors.

cc @martinemde

Verified improvement via this script:

script

```ruby #!/usr/bin/env ruby require 'rubygems' require 'bundler/inline' gemfile do source "https://rubygems.org/" gem 'memory_profiler' gem 'zlib', path: "/Users/segiddins/Development/github.com/ruby/zlib" end f = File.open("/Users/segiddins/.bundle/cache/compact_index/rubygems.org.443.29b0360b937aa4d161703e6160654e47/versions") gz = Zlib.gzip(f.read) File.write("/tmp/versions.gz", gz) f = Zlib::GzipReader.open("/tmp/versions.gz") buf = String.new(capacity: 16_384, encoding: Encoding::BINARY) MemoryProfiler.report do f.readpartial 16_384, buf until f.eof? end.pretty_print(scale_bytes: true) ```

Before

``` Total allocated: 24.71 MB (3378 objects) Total retained: 0 B (0 objects) allocated memory by gem ----------------------------------- 24.71 MB other allocated memory by file ----------------------------------- 24.71 MB Untitled 2.rb allocated memory by location ----------------------------------- 24.71 MB Untitled 2.rb:25 allocated memory by class ----------------------------------- 24.71 MB String allocated objects by gem ----------------------------------- 3378 other allocated objects by file ----------------------------------- 3378 Untitled 2.rb allocated objects by location ----------------------------------- 3378 Untitled 2.rb:25 allocated objects by class ----------------------------------- 3378 String retained memory by gem ----------------------------------- NO DATA retained memory by file ----------------------------------- NO DATA retained memory by location ----------------------------------- NO DATA retained memory by class ----------------------------------- NO DATA retained objects by gem ----------------------------------- NO DATA retained objects by file ----------------------------------- NO DATA retained objects by location ----------------------------------- NO DATA retained objects by class ----------------------------------- NO DATA Allocated String Report ----------------------------------- 1 "\nBibliografia.alu0100816167 0.2.0 c86a6472b1ee133100ee922bcffd3a39\nBibliografia_alu0100502107 0.1.0 bfaec8bf21658d9664741c8d923e4eca\nBidManagerLoggerGem 0.0.0 baa3c08c034fb911c0e6fd90d8219b1e\nBigCat 0" 1 Untitled 2.rb:25 1 ",1.0.1,1.0.2 a926a2955fc394053c68a3d50b6ef48e\nRailsRemoteControl 1.0.0 850c11e07777754b00ad66f506af8ae3\nRailsRunSignUp 0.0.1,0.0.3,0.0.4,0.0.5,0.0.7 f2323ed86e442b2b33ab73aa5cd95587\nRailsTop 0.0.1,0.0" 1 Untitled 2.rb:25 1 ",2022.2.19,2022.2.20,2022.2.21,2022.2.22,2022.2.24,2022.3.9,2023.1.29,2023.1.31,2023.5.25,2023.7.14,2023.7.20 0b781346d3306df5e023bbf0a5d1f988\nEZAccounts 0.6.1 3f289d4a6b3d1b11d7d2c2a6dd80fa75\nEZTOP10" 1 Untitled 2.rb:25 1 "-graticule 0.2.7.6,0.2.7.5,0.2.7.4,0.2.7.3,0.2.7.2 8f2de7b75ea069d8e310eddef4388588\nGUI-mini_magick 1.2.3 61cc135041fcec97105c69df07095fc4\nGUnit 0.1.2,0.2.0,0.2.1,0.2.2,0.3.0,0.3.1,0.3.2,0.3.3,0.3.4,0" 1 Untitled 2.rb:25 1 ".3.0,1.4.0,1.4.1,1.4.2 53814ff52d4ec9f2588cad7a79e7c272\nExit_Zero 0.1.0,1.0.0,1.0.1,1.1.0,1.1.1,1.1.2,1.1.3,1.2.0,1.3.0 5ed64f091d3410b02d803e4c141bd503\nExpressQuery 0.0.1,1.0.0,1.1.1 343c7b16462239c8" 1 Untitled 2.rb:25 1 "03ec4a0\nModelMaker 0.1.0,0.1.1,0.1.2 fb981323c3e30f29e5c055414b0e8357\nModule2Class 0.0.1 3fdb19bffdf40d75add588a571f9c44d\nMoneyConversions 0.1.0 272811aa64258719b3adb885315ab4c9\nMoney_DaWanda 0.0.0 76" 1 Untitled 2.rb:25 1 "04434d414302db7a904190e363cb6cdf\nP4Ruby-mingwx86 2014.1 4b85a4fb36bc3bc3be322135637b48f9\nP5_T_34 0.1.1,0.1.2 9284c3d71f9b135a1098cfe62b5101db\nP6-alu0101111254 0.1.0 5163068706fe08a4113b0334ab392b5d\nPA" 1 Untitled 2.rb:25 1 "05cb89f789bfa32a1975d6f1b44\nBBB 0.0.1,0.0.2,0.0.3,0.0.4,0.0.5,0.0.6,0.0.7,0.0.8,0.0.9,0.0.10,0.1.0,0.1.1,0.1.2,0.1.3,0.2.0,0.3.0 c191f999c3105549170c4e7bd7377f67\nBBRedCloth 0.8.0,0.8.1,0.8.2,0.8.3,0.8" 1 Untitled 2.rb:25 1 "0aca8d1\nLocksmith 0.0.0,0.1.1 7c310256129f799338aa71f208713348\nLogSimple 0.1 3b00fd807f15ed6447b6c6871be22e82\nLogic_test 0.0.1 7c46270d6975e6b7711b95a448f7f216\nLoomioScraper 0.1.0,0.1.1 489a4b884bf093" 1 Untitled 2.rb:25 1 "0bd67e707921fc\ntencentcloud-sdk-iottid 3.0.659 ced9f766d69fd167092e454ff688a6ea\ntencentcloud-sdk-mongodb 3.0.659 a080375825a02372c86c82cf8a60367f\ntencentcloud-sdk-bizlive 3.0.659 8f597c6e4c9d470a3e7e5" 1 Untitled 2.rb:25 1 "0feca9b939dcb97ef06aa91cb\nSpreadsheet-HTML 0.0.1,0.0.2,0.0.3,0.0.4,0.0.5,0.0.6,0.0.7,0.0.8,0.0.9,0.0.10,0.0.11,0.0.12,1.0.0,1.0.1,1.0.2,1.0.3 25bcde480ed325cdb679dcf357532488\nSpreadsheetML 0.1 c997b7b" 1 Untitled 2.rb:25 1 "1.0 cd725301f04f5e09a1bf4a80b9ccdb55\nJosephPecoraro-rr 1.0.4,1.0.3 fd13aa36aeeb1b9f077b21015c33a5fe\nJosephPecoraro-whenever 0.3.2,0.3.1 5b089cca8a96705ee79bf547053fbd98\nJuice10-bogus_sass_checker 0.1." 1 Untitled 2.rb:25 1 "10,2.0.11 0654aae51dabfd9bc66933ef8d1afcdf\nabiquo-etk 0.4.13,0.4.14,0.4.15,0.4.16,0.4.17,0.4.18,0.4.19,0.4.20,0.4.22,0.4.23,0.4.24,0.4.25,0.4.29,0.4.32,0.4.33,0.4.42,0.5.3,0.5.8,0.5.9,0.6.0,0.6.1,0.6." 1 Untitled 2.rb:25 1 "15,0.1.16,0.1.17,0.1.18,0.1.19,0.1.20,0.1.21,0.1.22,0.1.23,0.1.24,0.1.25,0.1.26,0.1.27,0.1.28,0.1.29,0.1.30,0.1.31,0.1.32,0.1.33,0.1.34,0.1.35,0.1.36,0.1.37,0.1.38,0.1.39,0.1.40,0.1.41,0.1.42,0.1.43 b" 1 Untitled 2.rb:25 1 "198da1acadb4ca2c5364\nSFAscrapi 1.2.2 4d2def6a28e3be807146c138305f9cc2\nSFAtidy 1.1.3 79d8a59b992cfbaa960c442a71a9368b\nSFDO-API 0.1.0,0.1.1,0.1.2,0.1.3,0.1.4,0.1.5,0.1.6,0.1.7,0.1.8,0.1.9,1.0.0,1.0.1,1." 1 Untitled 2.rb:25 1 "1e225cb3e\nThermostat_Joppe_Declercq 0.3.3 c23c9fcd243fc450ff0709c97c12bfa0\nThiagoLelis-backgroundjob 1.0.6,1.0.5,1.0.4,1.0.3,1.0.2 68105c2e0318b2c9f5eabeec6ee89e17\nThiagoLelis-bj-utc 1.0.2,1.0.1 9eb3b" 1 Untitled 2.rb:25 1 "1ea1f71b64f0bc0c7d69e48e0\nSbirsp 0.0.1 c33fc055771420a79e2c73a04050bd74\nScanSSL 0.0.1,0.0.2 66fe25a024331a0a44e7b615e0c20c20\nSchedulerWindows 0.0.0 7a1ac3e86d5c07ebfedbae4b65ef664d\nSchmock 0.9 ea463b2" 1 Untitled 2.rb:25 1 "2.0,0.3.0,0.4.0 ee02b0a530ec1d6cea1cfff08a110298\nTxChaos 0.0.2 87d29cbac99db2e0b09e6a85e9212a26\nTypoglycaemic 0.1.1 440a6a2861bdd3760ef6a467d8c8f76f\nUCSAPI 0.0.6,0.0.7,0.0.8,0.0.9 3edfa21b8d9f62e93361" 1 Untitled 2.rb:25 1 "281839b65c58db947bb5b70c9\nCarRegistrationIndia 0.0.2 bf8e77bd63089f92666cbc3618aaa3ae\nCarRegistrationIreland 0.0.2 fb75737a4dea91e4484e262fd11eb2ef\nCarRegistrationItaly 0.0.2 8842075a3975ae9e21b862e8c" 1 Untitled 2.rb:25 1 "36f8\nIPGlider-ImmutableAttributes 1.0.3.1 93cab95102ec337edd23c0fac12fb25f\nIPGlider-annotate 2.2.7,2.2.6 3ff6d312c8646836529684e0d5595f61\nIPGlider-metric_fu 1.1.5.1,1.1.5.1.1 1b18a62f2cdf418198d219eca" 1 Untitled 2.rb:25 1 "389f6ed2f6660b5f967a4c0a7fbb57ae\nJOT 1.0.2,1.0.3,1.0.4,1.0.5,1.0.8 080d1e648c4bf9c1a4528e959f2f8c88\nJRank 0.1.0 9a3a7e55a67abc57dc72ecc16d7f37d9\nJRuby-OpenSSL 0.1 4c413ca5761334b4008ba22410e5dbe0\nJSON" 1 Untitled 2.rb:25 1 "3a\nAndrey-hello_world 0.1.0,0.1.1,0.1.2,0.1.3 a6b7b86ac146e0be5549933faa2c587b\nAndyFirstGem 0.1.0 2ca49468af6351fe5b0ed7a2c7eb3176\nAnilistrb 0.1.1,0.1.2 34762f1030d18ee68bedf6942e3f06a4\nAnimeDL 0.1.1," 1 Untitled 2.rb:25 1 "53aba615e0ed5b07678\nabnf-parsing 0.2.0,0.2.1,0.2.2 01b99d03fb2fcd0c5d11f17fed9af514\nabnftt 0.1.1,0.2.1,0.2.2,0.2.3,0.2.4 769ceeedab9574e3556535947c27c6fc\nabnormal 0.0.0 3565d01266f5d7d1a11a022a2aa73a4" 1 Untitled 2.rb:25 1 "5be08a87430\naaronh-chronic 0.3.9 5e92631e9e460e49a48ed21e8923b684\naaronp-frex 1.0.1,1.0.0 4c51934a49cd360790d34e8fb1fdf3ad\naaronp-meow 1.1.0,1.0.0 696b645f09e8b577a0ba12803e776e2b\naaronp-zomg 1.0.2.20" 1 Untitled 2.rb:25 1 "621561a661\nPr0d1r2-paperclip-time-stamped 1.0.2,1.0.1,1.0.0 609c1dd80afe644bfb48d94f273367fd\nPr0d1r2-resource_controller 0.6.6 03a09281a4d9fe554b186e84715026b5\nPractica9 0.0.1 aa353d3cf70d946e1c8bf2a5" 1 Untitled 2.rb:25 1 "670ae1295038623dff512873250695\nDankest_View_Tool 0.1.0 3bb5d765faffdca9f7bf929a15c45e2a\nDanskeHelligdage 1.0.2 49963828c9bb510cd7bf43fe5483637e\nDarkFolio 0.1.0 770b9f4dfd501aeb6428851dc6df9673\nDashaMa" 1 Untitled 2.rb:25 1 "7,0.5.98,0.5.99,0.5.95,0.6.1,0.6.2,0.6.7,0.6.8,0.6.10,0.6.11,0.6.12,0.6.16,0.6.21,0.6.22,0.6.24,0.6.25,0.6.26,0.6.27,0.6.28,0.6.29,0.6.31,0.6.32,0.6.33,0.6.34,0.6.35,0.6.36,0.6.38,0.6.39,0.6.40,0.6.41" 1 Untitled 2.rb:25 1 "754349c6d\nClickClack 0.0.1,0.0.2,0.0.3 b7861c5c4d277c78365a80a426a151a4\nClickSpotter 0.1.1 6f59a77ea207488b629a54785faa0ae1\nClockOFF 0.0.1 4536a463ee6cdd81ccdd37db582195bb\nClosestSum 1.0.0-universal-l" 1 Untitled 2.rb:25 1 "7d\nVladTheEnterprising 0.1.6,0.1.7,0.1.8,0.2,0.1.4,0.1.5 b1e9969dec1927f7472ed610d894e4d5\nVoiceIt2 1.0.0,1.0.1,1.0.2,1.0.3,1.0.4,1.0.5,1.0.6,2.0.0,3.0.0,3.0.1,3.0.2,3.1.0,3.2.0,3.3.0,3.4.0,3.5.0,3.6.0" 1 Untitled 2.rb:25 1 "8d1bd01cb62e3e633732\nWinkerAI 0.0.01,0.0.02,0.0.03,0.0.04,0.0.05 51a44b0d85415c91c048d85ff44008ec\nWireAPI 0.3,0.4,0.5 ae643fea15a4ef81e8a1e30b2ec8de5f\nWishky 2.0.1 9d48d2ed9c14c714df2944d808427a29\nWon" 1 Untitled 2.rb:25 1 "9363b592abb629bc706d56c05cc65\nRiFF-RAFF 0.0.1 9715c4dd1fd3a3d4d006c233f764b6e9\nRidiculous 0.3,0.5 fe1500d870f555a6931d44e2d345378a\nRightScaleAPIHelper 0.4.0,0.4.1,0.4.2,0.4.3,0.4.4,0.4.5,0.4.6,0.5.7,0" 1 Untitled 2.rb:25 1 "9632c67031472\nSwfUtil 0.01 8dc23bb47a31c63e69a3825c64995f56\nSyd-sinatra 0.9.0.4,0.9.0.2,0.3.2 452d71d3bc8b9e9b264f409f7d87bc39\nSykGenData 0.1.2,0.1.3,0.1.4,0.1.5,0.1.6,0.1.7,0.1.8,0.1.9 1c668d3ba14e52" 1 Untitled 2.rb:25 1 "9f96c69acf7606a072d2a5df\nActionMailer-Base-to-use-an-absolute-path-template 1.0.0 63f40c52ceaf75627747dd4e5a32d375\nActionPool 0.0.1,0.1.0,0.2.0,0.2.1,0.2.2,0.2.3 9b9a08e67027c1e35cae5a25434ac681\nActio" 1 Untitled 2.rb:25 1 "FA 0.2.0 e8b428c46a768a65844551a3ab122024\nLIS-OUT 0.0.0 b957020ea9a2fb5548fb83caeecf978d\nLJ_RPS 0.1.0,0.1.1 69de688a86bdd96c832fc6c980daa434\nLMG_modbus 1.0.2,1.0.3,1.0.4 2ca98d4939106231269d2f9e845396" 1 Untitled 2.rb:25 1 "Lab_Plaid 0.0.0,0.0.1 b145c6cf314454f260031a7a9a523ca9\nDirTagger 2.0.0,2.0.1,2.0.3,2.0.4,2.0.5,2.0.6 f8d1ce386d30bb8c37c6ec2a95774443\nDirectorio_de_trabajo_del_equipo 0.1.0 8a64ffd5183e066763632670ea1" 1 Untitled 2.rb:25 1 "MediumToMarkdown 1.0.0,1.0.1,1.0.2,1.0.3,1.1.0,1.2.0,1.2.1,1.2.2,1.3.0,1.3.1,1.3.2,1.3.3,1.3.4,1.3.5,1.3.6,1.3.7,1.3.8,1.3.9,1.4.0,1.4.1,1.4.2,1.4.7,1.4.8,1.5.0,1.6.0,1.6.1,1.6.2,1.6.3,1.7.0,1.7.1,1.7" 1 Untitled 2.rb:25 1 "Ruby 0.0.1 6fceeb05f6d4d9a23fe96e03fc7a6dc8\nHelloRubyGem 0.0.0 429f816abe827687fb51b7554764966d\nHelloWorldAA 0.0.1 5b82d9aaf57d7c2d94246ec97c2116e6\nHelloWorldAlephCampos 0.1.0 420a0b341620a28bea7b3066" 1 Untitled 2.rb:25 1 "aDA 1.0.0,1.0.1,1.1.0 34a580b2a4b44913cb7348775de2fed4\nGradleSearch 1.0.0 6d43f990fee5dbaa04f5143627683d73\nGradleSearchResTools 1.0.3,1.0.4 769acf384902684100395999479c754a\nGraphUtils 0.0.0,0.0.1 1825" 1 Untitled 2.rb:25 1 "c3fe7fe590f78dae951c\nQuintero_view_tool 0.1.0 234f75130040fab82a8a0f7fef9afaf7\nQuipu 0.1.0 89a05f2eb212e7ea88b5a507d3f61eb6\nR3T 0.0.1 e6fcbebbb855226536dc400837ce88e1\nR3c 0.0.1,0.0.2,0.0.3,0.0.4,0.0.5" 1 Untitled 2.rb:25 1 "c830044d0ce1\nCASProjectServer 0.0.2 2db2c12bd35838013bad0bf0234fa3a6\nCA_breweries 0.1.0,0.1.1,0.1.2,0.1.3 4d32740df4a936df41c4d3bd45689b06\nCCHIT-xds-facade 0.1.1 dc01fed8498d33484576a2753eb3ce7c\nCFPro" 1 Untitled 2.rb:25 1 "cac3d5f25c3b48a31\nCpSoButtons 0.0.1,0.0.2,0.0.3,0.0.4,0.0.8 b2e14092aec2ecb6e80fd45dc9109322\nCreamCheese 1.0.2,1.0.3 8a0c8536d02e0c855ef276c272f8ef34\nCreizer-Meli 3.0.0,3.0.1 9279a45029583a02c4572f66b" 1 Untitled 2.rb:25 1 "d1cc61cc8d8fb001fd6d06b2e9b\nNCIRLFileUploader 0.0.1,0.0.12,0.0.13 a21b2bff3a253b98c2501161d3b8786a\nNDKMeans 0.0.1,0.0.2 e570c4bcc4c75c703576f5cbaa86dac9\nND_Studio_Game_Coursework 1.0.0 643e814e3d3118e" 1 Untitled 2.rb:25 1 "ef89bbf3202dfab9e48fb4efa0c0ede\nabbyy 0.2.0,0.2.1 18a30aca9c56f950480e7abf9afa8952\nabbyy-cloud 0.0.1,0.0.2,0.0.3,0.0.4,0.0.5,0.0.6,0.0.7,0.0.8,0.0.9,0.0.10 39dda5f39733b8e478e0639057b964cd\nabbyy-ruby " 1 Untitled 2.rb:25 1 "f5e13d5172e8baa5956f4364ef\nManfred-Turippu 0.1.0 b6da62eb14876b6502735f319ac3275f\nMangUpdate 0.0.1,0.0.2 2be47f50f357ac759260e26a6007ffe9\nMange-field_helpers 1.0.1 5e1007621f38e4cd52c1a0765ca4ec27\nMan" 1 Untitled 2.rb:25 1 "fa39e5e0d\nPerson 1.0.0 8e150bf5fa62743ada26e6a6db730156\nPetSearch 0.0.1,0.0.2,0.0.3,0.0.5,0.0.6,0.0.7.1,0.0.7.7,0.0.7.8,0.0.7.9,0.0.8.2.3 b4036b50d75ca17cd302f17ed360e232\nPeterCoulton-mygemify 0.0.3 1" 1 Untitled 2.rb:25 1 "fefb3b8a939c0abc0bdda44620bb4f2c\nNoDevent 0.0.2 1e08f8e0a1ae74cae3fb2a0bcc0e6552\nNoNo 0.0.1,0.0.2 834f380817d072110f76f5bc92130380\nNordes-logstash-input-azureblob 0.9.5.pre.1 f6ea8c9b99802b56875158b5b" 1 Untitled 2.rb:25 1 "lightXML2RESTDriver 0.1.0,0.1.1,0.1.2,0.1.3,0.1.4,0.1.5,0.1.6 a111aa5db4c4f72d3dd13849b5d76ade\nFlipkartSeller 0.0.1,0.0.2,0.0.3,0.0.4,0.0.5,0.0.6,0.0.7,0.0.8 3538a0f17a8f350cc3344d25dcbe8a2f\nFlipper 1" 1 Untitled 2.rb:25 1 "s 1.0.0 3fd17aec49e8efd8ab1e3f9bb04adea7\na_little_less 0.2.1,0.2.2,0.2.3,0.2.4 9e4eb8da7cf6f46b0f31475f9f586d87\na_marmita 0.0.1,0.5.8,0.5.9,0.5.10,0.5.11,0.5.12 d5a060341305461347e7f873142b2b0c\na_maze" 1 Untitled 2.rb:25 1 "t_int 0.0.3 6002cccdc2ddb5e28f944912e40e9e45\nabstats 0.0.1 4ef3552178957a15f35b736ed2d7084f\nabstract 1.0.0 bfa6d0ef9e861f83938206e083721eb6\nabstract-data-types 0.0.1 cfafc4f9be16f2fdbbb4bbf634de9796\na" 1 Untitled 2.rb:25 1 "test 0.0.1,0.0.2,0.1.0 6ba7b7c35b2610530cc01506d2727daf\na1447ll_test 0.0.1 9ff322f022cb696f92d5906463282287\na14z6ch_elapsed_days 0.0.1,0.0.2,0.0.3,0.0.4,0.0.5 060a3df67e6ec087fec63e1d5986217a\na14z8fy_" 1 Untitled 2.rb:25 ```

After

``` Total allocated: 0 B (0 objects) Total retained: 0 B (0 objects) allocated memory by gem ----------------------------------- NO DATA allocated memory by file ----------------------------------- NO DATA allocated memory by location ----------------------------------- NO DATA allocated memory by class ----------------------------------- NO DATA allocated objects by gem ----------------------------------- NO DATA allocated objects by file ----------------------------------- NO DATA allocated objects by location ----------------------------------- NO DATA allocated objects by class ----------------------------------- NO DATA retained memory by gem ----------------------------------- NO DATA retained memory by file ----------------------------------- NO DATA retained memory by location ----------------------------------- NO DATA retained memory by class ----------------------------------- NO DATA retained objects by gem ----------------------------------- NO DATA retained objects by file ----------------------------------- NO DATA retained objects by location ----------------------------------- NO DATA retained objects by class ----------------------------------- NO DATA ```

martinemde commented 12 months ago

Nice work @segiddins. I tested it out and it seems to work fine still for extracting gems. I agreed that read needs to accept an outbuf.

segiddins commented 11 months ago

New benchmark:

Script ```rb #!/usr/bin/env ruby # frozen_string_literal: true require 'rubygems' require 'bundler/inline' gemfile do source "https://rubygems.org/" gem 'memory_profiler' gem 'zlib'#, path: "/Users/segiddins/Development/github.com/ruby/zlib" gem 'benchmark-ipsa' end len = 16_384 Benchmark.ipsa do |x| x.report("read #{len} until eof") do f = Zlib::GzipReader.open("/tmp/versions.gz") f.read len until f.eof ensure f&.close end x.report("readpartial #{len} until eof") do f = Zlib::GzipReader.open("/tmp/versions.gz") f.readpartial len until f.eof ensure f&.close end x.report("readpartial #{len}, buf until eof") do buf = String.new(capacity: len, encoding: Encoding::BINARY) f = Zlib::GzipReader.open("/tmp/versions.gz") f.readpartial len, buf until f.eof ensure f&.close end read_takes_buf = begin Zlib::GzipReader.open("/tmp/versions.gz") { |f| f.read nil, nil } rescue ArgumentError false else true end x.report("read #{len}, buf until eof") do buf = String.new(capacity: len, encoding: Encoding::BINARY) f = Zlib::GzipReader.open("/tmp/versions.gz") f.read len, buf until f.eof ensure f&.close end if read_takes_buf x.compare! end ```
Before ``` Allocations ------------------------------------- read 16384 until eof 4488/0 alloc/ret 50/0 strings/ret 26.13 MB/0 B alloc/ret readpartial 16384 until eof 6728/0 alloc/ret 50/0 strings/ret 31.78 MB/0 B alloc/ret readpartial 16384, buf until eof 3411/0 alloc/ret 50/0 strings/ret 24.85 MB/0 B alloc/ret Warming up -------------------------------------- read 16384 until eof 1.000 i/100ms readpartial 16384 until eof 1.000 i/100ms readpartial 16384, buf until eof 1.000 i/100ms Calculating ------------------------------------- read 16384 until eof 13.390 (± 7.5%) i/s - 67.000 readpartial 16384 until eof 12.511 (± 8.0%) i/s - 63.000 readpartial 16384, buf until eof 12.463 (± 8.0%) i/s - 63.000 Comparison: read 16384 until eof: 13.4 i/s readpartial 16384 until eof: 12.5 i/s - same-ish: difference falls within error readpartial 16384, buf until eof: 12.5 i/s - same-ish: difference falls within error ```
After ``` Allocations ------------------------------------- read 16384 until eof 4488/0 alloc/ret 50/0 strings/ret 26.13 MB/0 B alloc/ret readpartial 16384 until eof 4488/0 alloc/ret 50/0 strings/ret 26.13 MB/0 B alloc/ret readpartial 16384, buf until eof 9/0 alloc/ret 3/0 strings/ret 5.03 kB/0 B alloc/ret read 16384, buf until eof 9/0 alloc/ret 3/0 strings/ret 5.03 kB/0 B alloc/ret Warming up -------------------------------------- read 16384 until eof 1.000 i/100ms readpartial 16384 until eof 1.000 i/100ms readpartial 16384, buf until eof 1.000 i/100ms read 16384, buf until eof 1.000 i/100ms Calculating ------------------------------------- read 16384 until eof 12.839 (± 7.8%) i/s - 64.000 readpartial 16384 until eof 13.298 (± 7.5%) i/s - 67.000 readpartial 16384, buf until eof 13.883 (± 7.2%) i/s - 70.000 read 16384, buf until eof 13.274 (±15.1%) i/s - 64.000 Comparison: readpartial 16384, buf until eof: 13.9 i/s readpartial 16384 until eof: 13.3 i/s - same-ish: difference falls within error read 16384, buf until eof: 13.3 i/s - same-ish: difference falls within error read 16384 until eof: 12.8 i/s - same-ish: difference falls within error ```

In summary:

@hsbt this will have a significant benefit for RubyGems/Bundler/RubyGems.org, I'd greatly appreciate your help in getting this reviewed 🙇🏻‍♂️

ps.

ran two more benchmarks, one for read nil and one for read nil, buf. no change in read nil, and read nil, buf drops allocations significantly, down to 4 strings (from 50) and 26.23MB, from 33.15, so I think there is still room to optimize the case where read with no length and no buffer is passed in.

pps.

This drops allocations for IO.copy_stream out to nearly 0, and in my testing speeds it up by maybe 10%

segiddins commented 11 months ago

@hsbt would appreciate a review on this one when you get a chance, thanks!

segiddins commented 11 months ago

@nobu i think I've addressed your concerns?

nobu commented 11 months ago

test/fixtures directory will be rejected to merge to ruby/ruby.

segiddins commented 11 months ago

How would you like me to include the text fixture if not in a fixtures directory?

deivid-rodriguez commented 11 months ago

I think ruby-core backport script rejects any changes that pollute the global test folder in ruby-core with a fixtures or lib folders: https://github.com/ruby/ruby/blob/f6e2f32a962c2c19a0418dfd2020c8c9ab32c8b0/tool/sync_default_gems.rb#L427-L435.

Maybe if you move the fixtures folder to test/zlib it would be acceptable for ruby-core?

nobu commented 11 months ago

I wonder the necessity of fixtures in this case at least. Is it necessary to be a pre-existing file?

segiddins commented 11 months ago

I'm happy to replace the fixture but I don't know how to manually generate a gzip file that demonstrates the same issue with readpartial -- it doesn't surface on all gzips. cc @martinemde

martinemde commented 11 months ago

All I can offer right now is that I previously tried unzipping and rezipping this data.tar. The freshly gzipped file no longer causes the issue. Including this gem in the test is a real world proof.

We could instead hunt around for a smaller example gem, if this is a size issue, or try to figure out what it is about this one file and recreate it (I don't have that expertise). We could also just generate random gzips until we find one that fails. 😁

nobu commented 10 months ago
diff --git c/test/zlib/fixtures/ruby-progressbar-1.13.0.gem.data.tar.gz i/test/zlib/fixtures/ruby-progressbar-1.13.0.gem.data.tar.gz
deleted file mode 100644
index 6e5581c..0000000
Binary files c/test/zlib/fixtures/ruby-progressbar-1.13.0.gem.data.tar.gz and /dev/null differ
diff --git c/test/zlib/test_zlib.rb i/test/zlib/test_zlib.rb
index 9d18058..5811f8b 100644
--- c/test/zlib/test_zlib.rb
+++ i/test/zlib/test_zlib.rb
@@ -1005,20 +1005,20 @@ if defined? Zlib
           assert_equal "".b, s
           assert_predicate f, :eof?
         end
+      }

-        File.open(File.expand_path("fixtures/ruby-progressbar-1.13.0.gem.data.tar.gz", __dir__), "rb") do |f|
-          Zlib::GzipReader.wrap(f) do |gzio|
-            count = 0
-            until gzio.eof?
-              s = gzio.read(16_384)
-              count += s.size
-              assert_predicate gzio, :eof? if s.empty?
-            end
-            assert_equal 59904, count
+      compressed_data_file do |f, size|
+        Zlib::GzipReader.wrap(f) do |gzio|
+          count = 0
+          until gzio.eof?
+            s = gzio.read(16_384)
+            count += s.size
+            assert_predicate gzio, :eof? if s.empty?
           end
-          assert_predicate f, :closed?
+          assert_equal size, count
         end
-      }
+        assert_predicate f, :closed?
+      end
     end

     def test_readpartial
@@ -1043,7 +1043,7 @@ if defined? Zlib
         end
       }

-      File.open(File.expand_path("fixtures/ruby-progressbar-1.13.0.gem.data.tar.gz", __dir__), "rb") do |f|
+      compressed_data_file do |f, size|
         Zlib::GzipReader.wrap(f) do |gzio|
           count = 0
           until gzio.eof?
@@ -1051,12 +1051,12 @@ if defined? Zlib
             count += s.size
             assert_predicate gzio, :eof? if s.empty?
           end
-          assert_equal 59904, count
+          assert_equal size, count
         end
         assert_predicate f, :closed?
       end

-      File.open(File.expand_path("fixtures/ruby-progressbar-1.13.0.gem.data.tar.gz", __dir__), "rb") do |f|
+      compressed_data_file do |f, size|
         Zlib::GzipReader.wrap(f) do |gzio|
           count = 0
           s = String.new(capacity: 16_384, encoding: Encoding::BINARY)
@@ -1071,7 +1071,7 @@ if defined? Zlib
               break
             end
           end
-          assert_equal 59904, count
+          assert_equal size, count
         end
       end
     end
@@ -1271,6 +1271,17 @@ if defined? Zlib
       }
     end

+    private
+    def compressed_data_file
+      Tempfile.create(caller_locations(1, 1).first.base_label) {|t|
+        size = Zlib::GzipWriter.open(t.path) {|gz|
+          data = File.read(__FILE__)
+          gz.print(data)
+          data.size
+        }
+        yield t, size
+      }
+    end
   end

   class TestZlibGzipWriter < Test::Unit::TestCase
segiddins commented 9 months ago

@nobu I just tried reverting some of my changes, and the test in your patch still passes when I undo the change at the end of gzfile_readpartial

hsbt commented 9 months ago

I'm discussing this with @nobu

nobu commented 9 months ago

@nobu I just tried reverting some of my changes, and the test in your patch still passes when I undo the change at the end of gzfile_readpartial

Isn't this PR an improvement of readpartial and unrelated to #56? Why mixing unrelated fix?

martinemde commented 8 months ago

@nobu It sounds like you don't want this PR to include the bugfix. It seems like including an effective test for this bug makes this PR unmergable.

How would you suggest we proceed?

nobu commented 8 months ago

@nobu It sounds like you don't want this PR to include the bugfix.

Yes, exactly. It's a different story.

martinemde commented 8 months ago

I was under the impression that the whole point of this PR is to fix #56, and the performance improvements were found as part of the fix.

Should we remove the bug fix from this PR so readpartial continues to be broken but is more memory efficient, or should we remove the memory efficiency improvements and just fix the bug?

martinemde commented 8 months ago

Note: rubygems does have 3 test fixture .gem files in test/rubygems/packages.

nobu commented 8 months ago

I was under the impression that the whole point of this PR is to fix #56, and the performance improvements were found as part of the fix.

The title and description look only for the latter, but nothing about #56.

Should we remove the bug fix from this PR so readpartial continues to be broken but is more memory efficient, or should we remove the memory efficiency improvements and just fix the bug?

I'm ok both, but you'll need to edit the title in the latter case.

segiddins commented 8 months ago

@nobu I removed the fix for #56

segiddins commented 8 months ago

Came up with a new fix for #56 in https://github.com/ruby/zlib/pull/73, would appreciate another review of this PR and that one when you get a chance!

segiddins commented 7 months ago

Rebased with truffleruby CI fix, I believe this now addresses @nobu's concerns and should pass CI, cc @hsbt

martinemde commented 7 months ago

@nobu, Have you given up on this PR?

segiddins commented 6 months ago

@nobu @nurse I would greatly appreciate a look at this change 🙇🏻

martinemde commented 6 months ago

@nobu Could you give us some feedback here about what Sam or I could do to get this PR ready? We think it would enable better performance in rubygems now that we will soon be able to use readpartial again.

Any ideas?

deivid-rodriguez commented 5 months ago

Maybe we can try opening another ruby-core ticket about this performance improvement. That worked great for getting the bug fix merged.

eregon commented 5 months ago

Maybe we can try opening another ruby-core ticket about this performance improvement. That worked great for getting the bug fix merged.

Yep, that seems a good way to get attention on this. Also note you should add such tickets to the dev meeting ticket to make sure it gets reviewed in the monthly dev meeting.

segiddins commented 5 months ago

https://bugs.ruby-lang.org/issues/20424