Open dustmop opened 4 years ago
Removing the errCount calculation entirely, by deleting the goroutine, gives the following profile:
Total time: 8.801275
base.InferValues 3.1844419999999998
base.prepareTasks 4.265854000000001
cafs.AddFile(s) 1.347918
base.prepareTasks 4.265854000000001
setErrCount -
setDepthAndEntryCount 3.268407
setChecksumAndLength 4.265744000000001
Roughly this is a 2x speedup in total. Here, we see that in total, the structure creation takes up 48% of a save operation. Together with inferring the schema, 85% of a save operation is in qri code, with only 15% in writing blocks to ipfs. This is also an unrealistic upper limit, if we restore the errCount but have it stream entries instead of loading them all into RAM, it will necessarily be slower than this profile.
Near-term wins based on @Arqu 's work: https://github.com/qri-io/qri/issues/1342
Added benchmarks: https://github.com/qri-io/qri/pull/1351 CSV read buffer increase: https://github.com/qri-io/dataset/pull/225 Qri Core prep for streaming validation: https://github.com/qri-io/qri/pull/1357 Dataset prep for streaming validation: https://github.com/qri-io/dataset/pull/226
The above is merged but the channel work is still todo.
Did a very rough profiling of the save command, using a 381MB csv file. On my machine, the entire command took 18.32367 seconds. There are really only three steps that take a significant amount of time (> 100ms):
The prepareDataset tasks take 75% of the time. Measuring them each individually:
These run in parallel, so this is not a sum, rather it shows which takes the longest by comparison. This shows that changing jsonSchema to run streaming over data by instead of over a deserialized structure may only speed up this operation with 5 seconds (26%).
Our best option to make save be really fast is this: https://github.com/qri-io/qri/issues/1167
Related issues on performance: "Qri takes ages when loading a large file as a body" https://github.com/qri-io/qri/issues/1112 "add performance benchmarks on master branch push" https://github.com/qri-io/qri/pull/1143 "saving with only a meta change should be fast" https://github.com/qri-io/qri/issues/1145