Closed jjconti closed 3 years ago
Depending on the content of the data, csvq will require at least 10 times more memory allocation than the target file size to keep all the data in memory at runtime. Can you check if your system's RAM and virtual memory have enough space to handle the file? If not, it is probably difficult to solve the problem and you should look for another way than csvq.
The actual amount of memory required to handle the data can be estimated by cutting the data to a size that can be processed and then running the same query with the --stats
option.
$ head -n 1000000 foo.csv > bar.csv
$ csvq --stats 'SELECT COUNT(*) FROM `bar.csv`'
+----------+
| COUNT(*) |
+----------+
| 999999 |
+----------+
Query Execution Time: 0.818470 seconds
Resource Statistics
---------------------------------
TotalTime: 0.818675 seconds
TotalAlloc: 351,134,832 bytes
HeapSys: 334,561,280 bytes
Mallocs: 10,002,885 objects
Frees: 2,769,768 objects
HeapSys
is the amount of memory required on the system for the execution of the query.
I was running csvq and got an out of memory error. Is there a way to avoid this making? This was the command:
./csvq -f csv -o santafe.csv 'select * from
datos_nomivac_covid19.csvwhere jurisdiccion_residencia="Santa Fe"'
The file was 1.6 GB. I was using the las build https://github.com/mithrandie/csvq/releases/download/v1.15.1/csvq-v1.15.1-linux-386.tar.gz
Error trace: