sethaxen / MRCFile.jl

Read and write files and manipulate data in the MRC2014 format
Other
16 stars 4 forks source link

Vectorized writing array improves its speed #5

Closed ehgus closed 3 years ago

ehgus commented 3 years ago

Hello again! I recently suffered from slow slow writing speed for 1.8Gb 2D-stacked tomography mrc file. I figured out array writing part is the bottleneck so I resolve this situation by utilizing vectorzied operations.

For benchmark, I put @time to every single lines of write method and compare the benchmark to my vectozied code.

function Base.write(io::IO, d::MRCData; compress = :none)
    @time newio = compressstream(io, compress)
    @time h = header(d)
    @time sz = write(newio, h)
    @time sz += write(newio, extendedheader(d))
    @time T = datatype(h)
    @time data = parent(d)
    @time fswap = bswapfromh(h.machst) 
    # @time write(newio,(fswap.(T.(data))))   ### this is my vectorized code that will replace iteration
    @time begin
    for i in eachindex(data)
        @inbounds sz += write(newio, fswap(T(data[i])))
    end
    @time close(newio)
    return sz
end

This is the test code

using MRC

file_name="HeLa.mrc"

for _ in 1:3
    # read and write
    orig_file = read(file_name,MRCData)
    write("copy_$(file_name)",orig_file)
    println("---- done writing ----")
    # compare orig and copied 
    copied_file = read("copy_$(file_name)",MRCData)
    println("writing correctness: $(copied_file == orig_file) \n")
end

Here is the benchmark for 1.8G 2D-stacked tomography mrc file. I listed last of the three repeated test results:

original code

0.000068 seconds (7 allocations: 16.344 KiB) 0.000000 seconds 0.000396 seconds (138 allocations: 11.688 KiB) 0.000117 seconds (2 allocations: 32 bytes) 0.000004 seconds 0.000000 seconds 0.000000 seconds 327.057910 seconds (2.82 G allocations: 42.000 GiB, 4.02% gc time) 0.000108 seconds ---- done writing ---- writing correctness: true

vectorzied code

0.000023 seconds (7 allocations: 16.344 KiB) 0.000000 seconds 0.000075 seconds (138 allocations: 11.688 KiB) 0.000064 seconds (2 allocations: 32 bytes) 0.000001 seconds 0.000000 seconds 0.000000 seconds 2.345554 seconds (8 allocations: 1.750 GiB, 12.91% gc time) 11.644342 seconds ---- done writing ---- writing correctness: true

I also have tested with http://ftp.rcsb.org/pub/emdb/structures/EMD-5778/map/emd_5778.map.gz:

original code

0.000229 seconds (27 allocations: 34.172 KiB) 0.000000 seconds 0.000311 seconds (134 allocations: 11.688 KiB) 0.000016 seconds (1 allocation: 16 bytes) 0.000006 seconds 0.000000 seconds 0.000000 seconds 8.186541 seconds (50.33 M allocations: 768.000 MiB, 2.05% gc time) 0.013294 seconds ---- done writing ---- writing correctness: true

vectorzied code

0.000043 seconds (27 allocations: 34.172 KiB) 0.000000 seconds 0.000065 seconds (134 allocations: 11.688 KiB) 0.000003 seconds (1 allocation: 16 bytes) 0.000001 seconds 0.000000 seconds 0.000000 seconds 3.910012 seconds (8 allocations: 64.000 MiB, 0.49% gc time) 0.009102 seconds ---- done writing ---- writing correctness: true