qri-io / qri

you're invited to a data party!
https://qri.io
GNU General Public License v3.0
1.11k stars 66 forks source link

improve performance of save by speeding up prepareDataset task goroutines #1168

Open dustmop opened 4 years ago

dustmop commented 4 years ago

Did a very rough profiling of the save command, using a 381MB csv file. On my machine, the entire command took 18.32367 seconds. There are really only three steps that take a significant amount of time (> 100ms):

Total time:        18.32367
base.InferValues    2.9655820000000004
base.prepareTasks  13.961739000000001
cafs.AddFile(s)     1.3928550000000008

The prepareDataset tasks take 75% of the time. Measuring them each individually:

base.prepareTasks  13.961739000000001
 setErrCount           13.961660000000002
 setDepthAndEntryCount  8.004982000000002
 setChecksumAndLength   9.080462

These run in parallel, so this is not a sum, rather it shows which takes the longest by comparison. This shows that changing jsonSchema to run streaming over data by instead of over a deserialized structure may only speed up this operation with 5 seconds (26%).

Our best option to make save be really fast is this: https://github.com/qri-io/qri/issues/1167

Related issues on performance: "Qri takes ages when loading a large file as a body" https://github.com/qri-io/qri/issues/1112 "add performance benchmarks on master branch push" https://github.com/qri-io/qri/pull/1143 "saving with only a meta change should be fast" https://github.com/qri-io/qri/issues/1145

dustmop commented 4 years ago

Removing the errCount calculation entirely, by deleting the goroutine, gives the following profile:

Total time:        8.801275
base.InferValues   3.1844419999999998
base.prepareTasks  4.265854000000001
cafs.AddFile(s)    1.347918
base.prepareTasks  4.265854000000001
 setErrCount           -
 setDepthAndEntryCount 3.268407
 setChecksumAndLength  4.265744000000001

Roughly this is a 2x speedup in total. Here, we see that in total, the structure creation takes up 48% of a save operation. Together with inferring the schema, 85% of a save operation is in qri code, with only 15% in writing blocks to ipfs. This is also an unrealistic upper limit, if we restore the errCount but have it stream entries instead of loading them all into RAM, it will necessarily be slower than this profile.

b5 commented 4 years ago

Near-term wins based on @Arqu 's work: https://github.com/qri-io/qri/issues/1342

Arqu commented 4 years ago

Added benchmarks: https://github.com/qri-io/qri/pull/1351 CSV read buffer increase: https://github.com/qri-io/dataset/pull/225 Qri Core prep for streaming validation: https://github.com/qri-io/qri/pull/1357 Dataset prep for streaming validation: https://github.com/qri-io/dataset/pull/226

Arqu commented 4 years ago

The above is merged but the channel work is still todo.