harelba / q

q - Run SQL directly on delimited files and multi-file sqlite databases
http://harelba.github.io/q/
GNU General Public License v3.0
10.15k stars 419 forks source link

How does the speed depend on the environment? #323

Open sanasar-dev opened 8 months ago

sanasar-dev commented 8 months ago

First of all, I want to say that I really liked this tool. This is an amazing, I have one question. Why q is slow about two times on production? I know that this is not the correct question, but will try to explain a little. This is working perfectly on my laptop (i7 12 GEN/ 4 CPU/ 2 Threads/ 32 GB RAM) and the server is Digitalocean droplet (8 CPU/ 16GB RAM). Could you please give me some idea of how can I speed up q on production?

harelba commented 8 months ago

Hi, the speed mostly depends on the machine's memory and the size of the data at hand.

From the specs you've sent the difference might be related to the memory size (32 vs 16 gb). However, q has one feature that might help with that - caching.

When activating caching (-C readwrite), any file that is being accessed is processed in a regular manner. However, another file with a .qsql postfix is being written. This file allows subsequent executions to be much much faster and take a much smaller amount of memory.

Using the qsql file can be done in two forms:

I hope this will help you speed up things. Will be great if you can update here on the results.

Harel

sanasar-dev commented 8 months ago

Thanks for your reply. The cache is enabled on both my local and production. The query is running on the same file, but it is running twice as slow in production, and that is why I wanted to know if there are any other configurations that need to be checked.

This is my query:

q -H -d ";" -e UTF-8 -Q UTF-8 -C readwrite "select *, iif(cs.adset_status = 'ARCHIVED' or cs.campaign_status = 'ARCHIVED', 'ARCHIVED', iif(cs.adset_status = 'DELETED' or cs.campaign_status = 'DELETED', 'DELETED', iif(cs.adset_status = 'PAUSED' or cs.campaign_status = 'PAUSED' , 'PAUSED', 'ACTIVE'))) as status, domain || '_' || lang || '_' || slug as url, sum(clicks) as r_clicks, ROUND(sum(spend), 1) as r_spend, ROUND(sum(ay_revenue), 1) as r_ay_revenue, ROUND(sum(ay_revenue) - sum(spend), 1) as r_profit, ROUND(avg(cpc), 3) as r_cpc, ROUND(avg(roas), 1) as r_roas, ROUND(avg(cpr), 2) as r_cpr, sum(impressions) as r_impressions, sum(ay_impressions) as r_ay_impressions, sum(ay_sessions) as r_ay_sessions, ROUND(COALESCE(sum(ay_impressions) / sum(ay_sessions), 0), 1) as r_ads_per_session, ROUND((COALESCE((sum(clicks) * 1.0) / sum(impressions), 0) * 100), 1) as r_ctr, ROUND((COALESCE((sum(ay_revenue) * 1.0) / sum(spend), 0) * 1000), 1) as r_ay_roas, ROUND((COALESCE((sum(ay_revenue) * 1.0) / sum(clicks), 0)), 3) as r_rpc, ROUND((COALESCE((sum(ay_revenue)-sum(spend) * 1.0)/sum(ay_revenue), 0) * 100), 1) as r_profit_margin from /var/www/fb-tool/public/storage/reports/campaigns/2023_09.csv as cr left join /var/www/fb-tool/public/storage/reports/adset-statuses/adset-statuses.csv as cs on cr.campaign_id = cs.c_id where date >= '2023-09-01' and date <= '2023-09-30' group by campaign_id order by r_profit desc limit 30 offset 0" -E UTF-8 -O