uwescience / sqlshare

Documentation and help for the SQLShare project
escience.washington.edu/sqlshare
7 stars 2 forks source link

Issue downloading specific file #27

Closed sr320 closed 10 years ago

sr320 commented 10 years ago

This does not appear to be a system wide issue but I cannot download the file

https://sqlshare.escience.washington.edu/sqlshare#s=query/sr320%40washington.edu/BiGo_Larvae_joineddata

I have tried 1) joining half the number of files (3) 2) the python client

I do not think it is a file size issue, but maybe? Thanks for any insight

dhalperi commented 10 years ago

Hi Steven,

Digging in, I think it might just be that this query is really slow. Comparing based on the string chr_start is going to be asking the database to do a lot of work:

chr + '_' + (cast (pos as varchar)) as chr_start,
chr as chr,
pos as start,

To answer first.chr_start = second.chr_start, the database has to construct the string chr+'_'+start and then use string comparison. It should execute much faster if you use:

first.chr = second.chr AND first.start = second.start

Here it can re-order the checks and it can compare numbers (like pos) MUCH more efficiently than it can compare strings.

Dan

sr320 commented 10 years ago

Thanks Dan - that did the trick!

For those following at home...

this query was the problem

SELECT * 
FROM [sr320@washington.edu].[_BiGo_lar_nonred_ID.txt]id
LEFT JOIN [sr320@washington.edu].[_BiGo_lar_M1_hack]M1
ON id.[chr_start]=M1.[chr_start]
LEFT JOIN [sr320@washington.edu].[_BiGo_lar_T1D3_hack]T1D3
ON id.[chr_start]=T1D3.[chr_start]
LEFT JOIN [sr320@washington.edu].[_BiGo_lar_T1D5_hack]T1D5
ON id.[chr_start]=T1D5.[chr_start]
LEFT JOIN [sr320@washington.edu].[_BiGo_lar_M3_hack]M3
ON id.[chr_start]=M3.[chr_start]
LEFT JOIN [sr320@washington.edu].[_BiGo_lar_T3D3_hack]T3D3
ON id.[chr_start]=T3D3.[chr_start]
LEFT JOIN [sr320@washington.edu].[_BiGo_lar_T3D5_hack]T3D5
ON id.[chr_start]=T3D5.[chr_start]

and this was the solution

SELECT * 
FROM [sr320@washington.edu].[BiGo_lar_nonred_ID split]id
LEFT JOIN [sr320@washington.edu].[_BiGo_lar_M1_hack]M1
ON id.[chr]=M1.[chr] AND id.[start]=M1.[start]
LEFT JOIN [sr320@washington.edu].[_BiGo_lar_T1D3_hack]T1D3
ON id.[chr]=T1D3.[chr] AND id.[start]=T1D3.[start]
LEFT JOIN [sr320@washington.edu].[_BiGo_lar_T1D5_hack]T1D5
ON id.[chr]=T1D5.[chr] AND id.[start]=T1D5.[start]
LEFT JOIN [sr320@washington.edu].[_BiGo_lar_M3_hack]M3
ON id.[chr]=M3.[chr] AND id.[start]=M3.[start]
LEFT JOIN [sr320@washington.edu].[_BiGo_lar_T3D3_hack]T3D3
ON id.[chr]=T3D3.[chr] AND id.[start]=T3D3.[start]
LEFT JOIN [sr320@washington.edu].[_BiGo_lar_T3D5_hack]T3D5
ON id.[chr]=T3D5.[chr] AND id.[start]=T3D5.[start]

In short I created a new column (that asked a lot of SQLShare) that in fact was not even needed as I failed to take advantage of AND in the join.