Closed bkamins closed 3 years ago
It still can be useful because each script has defined timeout in _control/timeout.csv
. So in case if reading csv will take to much time then script can be terminated later on because of reaching timeout.
In the past (i.e. before the last benchmark) we used single thread always and we did not hit these timeouts - right?
Long time ago it was hitting the limits but I don't think now it will be problem.
I run interactively using 20 cores
# before change proposed in this PR
52.193861 seconds (710.69 k allocations: 3.925 GiB, 81.14% gc time, 0.88% compilation time)
44.617192 seconds (420 allocations: 3.886 GiB, 86.10% gc time)
# after
19.729487 seconds (103.54 k allocations: 3.989 GiB, 73.19% gc time, 0.07% compilation time)
35.267162 seconds (415 allocations: 5.253 GiB, 80.55% gc time)
now running full benchmark
Thank you! This is what I expected, i.e. the GC issue is not resolved, but using a single thread for CSV reading lessens the problem (@quinnj: what @jangorecki reports is exactly the same issue with multi-threading that I have reported to you)
Additionally: if we resolve the GC issue the run-time of this query should be around 7 seconds which is around what I get on a machine with enough RAM (and this would get us within a reasonable range in comparison to other packages).
we do not measure CSV.jl parsing performance in this benchmark so disable using multiple threads for CSV reading in Julia tests.