Closed bkamins closed 3 years ago
also there was a typo in CSV reader kwarg name which I fixed.
I have also tested locally that calling GC twice is not needed, so I remove it - and leave only one call.
I had also to fix the big
name, as it clashes with the standard function in Julia:
big(x)
Convert a number to a maximum precision representation (typically BigInt or BigFloat). See BigFloat for information about some pitfalls with
floating-point numbers.
(the issue is exposed when enabling multi-threading)
For consistency I have added _df
suffix to all DataFrame
names in the join benchmarks.
@jangorecki - is all I propose clear and acceptable? Thank you!
Is there anything similar in julia?
Normally you pass -t 20
argument, but in your OS configuration it does not work unfortunately because the -S
is not supported.
Alternatively, as discussed earlier, you can create intermediate .sh files containing respectively:
#!/bin/bash
julia -t 20 groupby-juliadf.jl
#!/bin/bash
julia -t 20 join-juliadf.jl
(and then the shebang line can be removed from the .jl files) and call these .sh files from the launcher.
Would this approach work for you?
Env var looks to be more simple.
OK - so I understand it can be left as is now? (ah - or you move it so some other .sh file - right?) Thank you!
as is now is good
It is a pity that it is not possible to change number of threads after julia is already started. I will have to use extra shell script as you suggested. Setting env var is more neat but will not work when running single script with _launcher/solution.R
script.
Indeed it is a pity. I really wish it was possible to change number of threads that Julia process uses (and AFAICT it might be possible in the future, but not currently). Thank you for working on it.
@jangorecki - I understand that the current timings shown on the page (that are dated for May 7, 2021 are still old for DataFrames.jl - right? Does the date next to the package version show when the test was run?)
As a reference: the current release of DataFrames.jl is 1.1.1, so I assume that figures are for the old run.
On the top of benchplot there are versions and dates. Julia is already running now so till tomorrow should be on the report.
What I change here:
-S
shebang, instead useexport JULIA_NUM_THREADS=20
(I understand you spawn the tests as the child processes of run.sh - right?)@jangorecki The only thing I was not sure was how you wanted to handle the
Threads.nthreads()
reporting form the groupby-juliadf.jl and join-juliadf.jl, so I have not added it.