Open matyaskopp opened 8 years ago
Can you run the sql queries with EXPLAIN
? It should show the execution plan - which in this case might be the problem.
Also, did you run the tests several times to eliminate DB cold start/caching?
Runned with EXPLAIN (ANALYZE TRUE)
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=51359.79..51359.81 rows=1 width=8) (actual time=7842.010..7842.011 rows=1 loops=1)
-> Aggregate (cost=51359.79..51359.80 rows=1 width=0) (actual time=7842.008..7842.008 rows=1 loops=1)
-> HashAggregate (cost=51359.77..51359.78 rows=1 width=12) (actual time=7840.322..7841.502 rows=4899 loops=1)
-> Nested Loop (cost=4.01..51359.76 rows=1 width=12) (actual time=700.981..7833.317 rows=4899 loops=1)
-> Nested Loop (cost=3.58..51353.38 rows=1 width=8) (actual time=691.348..7395.597 rows=59755 loops=1)
-> Nested Loop (cost=3.15..51347.07 rows=1 width=12) (actual time=691.336..6793.372 rows=167105 loops=1)
-> Nested Loop (cost=2.73..51340.77 rows=1 width=16) (actual time=691.223..4373.315 rows=690388 loops=1)
-> Merge Left Join (cost=2.30..51334.46 rows=1 width=12) (actual time=691.152..1596.471 rows=738953 loops=1)
Merge Cond: (n1.functions = "n1/2"."#idx")
Filter: ((("n1/2"."#value")::text = ''::text) OR ("n1/2"."#value" IS NULL))
Rows Removed by Filter: 243469
-> Index Scan using "#i_pennmrg_9" on nonterminal n1 (cost=0.42..39897.52 rows=978873 width=16) (actual time=0.011..959.929 rows=978873 loops=1)
-> Index Scan using "#i_pennmrg_2" on "pennmrg##9" "n1/2" (cost=0.42..7656.45 rows=243469 width=8) (actual time=0.007..76.722 rows=243469 loops=1)
-> Index Scan using nonterminal_pkey on nonterminal n0 (cost=0.42..6.30 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=738953)
Index Cond: ("#idx" = n1."#parent_idx")
Filter: ("#idx" <> n1."#idx")
-> Index Scan using "#i_pennmrg_3" on "nonterminal/cat" "n0/0" (cost=0.42..6.30 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=690388)
Index Cond: ("#idx" = n0.cat)
Filter: (("#value")::text = 'VP'::text)
Rows Removed by Filter: 1
-> Index Scan using "#i_pennmrg_3" on "nonterminal/cat" "n1/1" (cost=0.42..6.30 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=167105)
Index Cond: ("#idx" = n1.cat)
Filter: (("#value")::text = 'NP'::text)
Rows Removed by Filter: 1
-> Index Scan using "#i_pennmrg_17" on terminal n2 (cost=0.43..6.37 rows=1 width=8) (actual time=0.006..0.007 rows=0 loops=59755)
Index Cond: ("#parent_idx" = n1."#idx")
>>>FORM>>>>>> Filter: (((pos)::text = 'DT'::text) AND ((form)::text = 'the'::text))
Rows Removed by Filter: 1
Total runtime: 7842.210 ms
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=48426.30..48426.32 rows=1 width=8) (actual time=1837.483..1837.484 rows=1 loops=1)
-> Aggregate (cost=48426.30..48426.31 rows=1 width=0) (actual time=1837.481..1837.482 rows=1 loops=1)
-> HashAggregate (cost=48426.27..48426.28 rows=1 width=12) (actual time=1835.799..1836.997 rows=4902 loops=1)
-> Nested Loop (cost=2.12..48426.27 rows=1 width=12) (actual time=0.119..1829.437 rows=4902 loops=1)
-> Nested Loop (cost=1.70..48419.96 rows=1 width=16) (actual time=0.110..1803.340 rows=4909 loops=1)
-> Nested Loop (cost=1.27..48413.65 rows=1 width=20) (actual time=0.073..1566.176 rows=44169 loops=1)
-> Nested Loop Left Join (cost=0.84..48407.34 rows=1 width=16) (actual time=0.064..1334.293 rows=44171 loops=1)
Filter: ((("n1/2"."#value")::text = ''::text) OR ("n1/2"."#value" IS NULL))
Rows Removed by Filter: 15234
-> Nested Loop (cost=0.42..45239.32 rows=509 width=20) (actual time=0.053..1209.099 rows=59395 loops=1)
-> Seq Scan on terminal n2 (cost=0.00..41143.73 rows=509 width=8) (actual time=0.030..732.458 rows=59395 loops=1)
>>>FORM>>>>>> Filter: (((pos)::text = 'DT'::text) AND (lower((form)::text) = 'the'::text))
Rows Removed by Filter: 1193618
-> Index Scan using nonterminal_pkey on nonterminal n1 (cost=0.42..8.04 rows=1 width=16) (actual time=0.007..0.007 rows=1 loops=59395)
Index Cond: ("#idx" = n2."#parent_idx")
-> Index Scan using "#i_pennmrg_2" on "pennmrg##9" "n1/2" (cost=0.42..6.21 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=59395)
Index Cond: ("#idx" = n1.functions)
-> Index Scan using nonterminal_pkey on nonterminal n0 (cost=0.42..6.30 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=44171)
Index Cond: ("#idx" = n1."#parent_idx")
Filter: ("#idx" <> n1."#idx")
-> Index Scan using "#i_pennmrg_3" on "nonterminal/cat" "n0/0" (cost=0.42..6.30 rows=1 width=4) (actual time=0.005..0.005 rows=0 loops=44169)
Index Cond: ("#idx" = n0.cat)
Filter: (("#value")::text = 'VP'::text)
Rows Removed by Filter: 1
-> Index Scan using "#i_pennmrg_3" on "nonterminal/cat" "n1/1" (cost=0.42..6.30 rows=1 width=4) (actual time=0.002..0.005 rows=1 loops=4909)
Index Cond: ("#idx" = n1.cat)
Filter: (("#value")::text = 'NP'::text)
Rows Removed by Filter: 0
Total runtime: 1837.597 ms
I would expect the second query to be slower because it contains Seq Scan
- obviously -> the lower(term)
values are not indexed.
Try to assign more memory to postgres - maybe the indexes are too large. The 1st query should be quicker. Look at: http://dba.stackexchange.com/questions/18484/tuning-postgresql-for-large-amounts-of-ram#18486
work_mem
is the variable to play with.If unsure try: http://pgtune.leopard.in.ua/
Dear LINDAT people ,
The following query, submitted by one of my students, leads to a timeout when the timeout is set to (the default value) 30. It works fine (but slowly) when the timeout is set to 200.
The following query, however, which is identical except for the application of the lower() function to the form attribute, runs without any problems within the default timeout limit of 30:
It looks somewhat counterintuitive to me that the first query would require more time than the second one. If anything, I would expect the first one to be faster, since one can directly use an index on the form attribute (if such an index exists).
Also, the progress bar gets to the end, and then starts again in the first query (but not in the second).
Both phenomena suggest to me that there might be some bug or undesirable feature, which is why I report it to you. Or if it is not a bug, can you explain to me why the first query requires more computing time than the second query?
Thanks Jan
Query speeds
Tested on The Penn Treebank 3, WSJ data set
Slower query
form='the'
nonterminal[cat='VP', nonterminal [cat='NP', functions='', terminal[pos='DT' and form='the']]] >> count()
result:
Time: 9732.483 ms
Faster query
lower(form)='the'
nonterminal[cat='VP', nonterminal [cat='NP', functions='', terminal[pos='DT' and lower(form)='the']]]>>count()
result:
Time: 1845.408 ms