Closed st-pasha closed 5 years ago
this is the proper way to run a single solution, unfortunately clickhouse is not yet escaped nicely
@st-pasha please retry on latest master
@st-pasha any update on this?
Apologies, I missed your previous comment somehow.
With latest master I no longer see any clickhouse-related problems:
Error: '\.' is an unrecognized escape in character string starting ""[^0-9\."
Execution halted
# Benchmark run 1564421368 started
starting: pydatatable groupby G1_1e7_1e2_0_0
/bin/bash: out/run_pydatatable_groupby_G1_1e7_1e2_0_0.out: No such file or directory
finished: pydatatable groupby G1_1e7_1e2_0_0
starting: pydatatable groupby G1_1e7_1e1_0_0
/bin/bash: out/run_pydatatable_groupby_G1_1e7_1e1_0_0.out: No such file or directory
finished: pydatatable groupby G1_1e7_1e1_0_0
starting: pydatatable groupby G1_1e7_2e0_0_0
/bin/bash: out/run_pydatatable_groupby_G1_1e7_2e0_0_0.out: No such file or directory
finished: pydatatable groupby G1_1e7_2e0_0_0
starting: pydatatable groupby G1_1e7_1e2_0_1
/bin/bash: out/run_pydatatable_groupby_G1_1e7_1e2_0_1.out: No such file or directory
finished: pydatatable groupby G1_1e7_1e2_0_1
# Benchmark run 1564421368 has been completed in 1s
For the first error, my guess is that bash "eats" one level of escaping, so R sees only \.
which is not a proper escape. An easy way to fix this is to remove backslashes altogether, since in regex language a dot inside square brackets is always interpreted literally. So, after doing that and running the command in R I get:
Error in `[.data.table`(data.table::fread("free -h | grep Swap", header = FALSE), :
Item 1 of j is 1 which is outside the column number range [1,ncol=0]
In addition: Warning message:
In data.table::fread("free -h | grep Swap", header = FALSE) :
File '/var/folders/d7/dw1pt7c114711zdyqf4gtg0h0000gn/T//RtmpVh7i9Q/file85061dade4dc' has size 0. Returning a NULL data.table.
Running just the first fread command returns:
> data.table::fread("free -h | grep Swap", header=FALSE)
sh: free: command not found
Null data.table (0 rows and 0 cols)
Warning message:
In data.table::fread("free -h | grep Swap", header = FALSE) :
File '/var/folders/d7/dw1pt7c114711zdyqf4gtg0h0000gn/T//RtmpVh7i9Q/file850638c36bd' has size 0. Returning a NULL data.table.
So the actual issue is that my shell doesn't have the free
command line utility, yet somehow data.table gobbles that error and issues a warning instead.
Still, despite the errors above the benchmark runs, producing some more error messages:
starting: pydatatable groupby G1_1e7_1e2_0_0
/bin/bash: out/run_pydatatable_groupby_G1_1e7_1e2_0_0.out: No such file or directory
finished: pydatatable groupby G1_1e7_1e2_0_0
I don't know what was supposed to be printed here, but I was hoping for something similar to the benchmark chart:
Question 1 -- first run time -- second run time
Question 2 -- first run time -- second run time
...
Are you trying to use osx to run benchmark? It was designed having debian-compatible os in mind. Software that is used on our machine that runs benchmark:
GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)
free from procps-ng 3.3.10
The last issue is I believe about missing out
directory, will amend code to create it automatically if it doesn't exist.
Timings are landing in time.csv
file, attempts of running scripts are landing in logs.csv
.
structure of timings is following:
question 1 -- first run time
question 1 -- second run time
question 2 -- first run time
question 2 -- second run time
which is later processed for reports to the structure you mentioned in https://github.com/h2oai/db-benchmark/blob/936c3a6aaaf3045b62e4c5b0e3a705a1a867f4e2/report.R#L68
please retry latest master, ideally after installing free
According to SO, the equivalent of free
on MacOS is vm_stat
, which reports things like this:
$ vm_stat
Mach Virtual Memory Statistics: (page size of 4096 bytes)
Pages free: 208197.
Pages active: 1478906.
Pages inactive: 868832.
Pages speculative: 107124.
Pages throttled: 0.
Pages wired down: 997248.
Pages purgeable: 9437.
"Translation faults": 36699619531.
Pages copy-on-write: 444929577.
Pages zero filled: 5459321091.
Pages reactivated: 487618793.
Pages purged: 19600537.
File-backed pages: 468271.
Anonymous pages: 1986591.
Pages stored in compressor: 4044251.
Pages occupied by compressor: 533481.
Decompressions: 196753666.
Compressions: 1049140452.
Pageins: 87517994.
Pageouts: 129923.
Swapins: 148769198.
Swapouts: 370675952.
Now, disabling swap can be done (https://summercode.com/wiki/how-to-disable-or-enable-swapping-in-mac-os-x), but it seems mighty dangerous... However, since the check is optional (the script keeps running even if the check fails), I guess it's not that important.
This is the output that I'm currently getting:
sh: free: command not found
Error in `[.data.table`(data.table::fread("free -h | grep Swap", header = FALSE), :
Item 1 of j is 1 which is outside the column number range [1,ncol=0]
Calls: [ -> [.data.table
In addition: Warning message:
In data.table::fread("free -h | grep Swap", header = FALSE) :
File '/var/folders/d7/dw1pt7c114711zdyqf4gtg0h0000gn/T//Rtmp7qDc35/filef3461723cb91' has size 0. Returning a NULL data.table.
Execution halted
# Benchmark run 1564431516 started
starting: pydatatable groupby G1_1e7_1e2_0_0
finished: pydatatable groupby G1_1e7_1e2_0_0: stderr 5
starting: pydatatable groupby G1_1e7_1e1_0_0
finished: pydatatable groupby G1_1e7_1e1_0_0: stderr 5
starting: pydatatable groupby G1_1e7_2e0_0_0
finished: pydatatable groupby G1_1e7_2e0_0_0: stderr 5
starting: pydatatable groupby G1_1e7_1e2_0_1
finished: pydatatable groupby G1_1e7_1e2_0_1: stderr 5
# Benchmark run 1564431516 has been completed in 2s
At first it was complaining about # Benchmark run 1564431330 aborted. './data' directory does not exists
, but that error disappeared after creating directory "data". I even copied the files "G11e7*" there, just in case. Still, some errors are produced in the printout above, and I can't figure out what they mean.
please include some out/*.err
,
note that data files are now named G1_1e7_1e2_0_0.csv
, the old name did not have two extra zeros which stands for NA percentage and if data are ordered.
Ah, I see. The .err files complain about missing module "psutil" and "pandas". After installing those the script finally runs
if there are no other problems here, and you obtained timings from time.csv file then we can close this issue.
sure
In
run.conf
I specify to run the benchmark fordatatable
only:Still, when running
run.sh
the error is returned related to missing clickhouse client:What is the proper way to run a single solution?