invenia / JLSO.jl

Julia Serialized Object (JLSO) file format for storing checkpoint data.
MIT License
90 stars 5 forks source link

Large JLSO files #75

Closed rofinn closed 3 years ago

rofinn commented 3 years ago

Closing #21

Pros:

Cons:

codecov[bot] commented 3 years ago

Codecov Report

Merging #75 into master will increase coverage by 2.66%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #75      +/-   ##
==========================================
+ Coverage   94.90%   97.57%   +2.66%     
==========================================
  Files           6        6              
  Lines         157      206      +49     
==========================================
+ Hits          149      201      +52     
+ Misses          8        5       -3     
Impacted Files Coverage Δ
src/JLSO.jl 100.00% <ø> (ø)
src/JLSOFile.jl 90.47% <ø> (-3.28%) :arrow_down:
src/file_io.jl 94.33% <100.00%> (+5.45%) :arrow_up:
src/upgrade.jl 100.00% <100.00%> (+1.85%) :arrow_up:
src/metadata.jl 100.00% <0.00%> (ø)
src/serialization.jl 100.00% <0.00%> (+9.09%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 03c30e5...be58e4c. Read the comment docs.

oxinabox commented 3 years ago

Not reviewed properly, but this seems like a sensible way to store the data.

rofinn commented 3 years ago

Cool, thanks for reviewing @oxinabox I'll make those changes and merge.

rofinn commented 3 years ago

My main concern would be how does a generic BSON reader know when to stop reading the BSON header?

The BSON header has it's own EOF byte that should tell any reader to stop reading the doc at the end of the header. That assumes the reader is following the BSON spec though. I'll add some tests using the python bson package since that seems like the easiest options for julia interop.

oxinabox commented 3 years ago

I'll add some tests using the python bson package since that seems like the easiest options for julia interop.

They don't need to be CI tests. Just doing a manual check once seems fine to me. Adding CI tests about the interoperability of the BSON metadata seems beyond the scope of this PR, but also reasonable to do one day. (Low priority though)

rofinn commented 3 years ago

Just doing a manual check once seems fine to me.

Okay, yeah. I even have a little python snippet for extract raw object data from the file in python using the bson library. I've made a separate issue for that in #76