segasai / q3c

PostgreSQL extension for spatial indexing on a sphere
GNU General Public License v2.0
76 stars 27 forks source link

Crash when q3c_join used for cone search #3

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.  Create table, pos_large, of HEASARC observations with large areas or 
positional uncertainties.  This is a table of about 400K rows and has four 
columns: the original table the data is from, the ra, dec, and the default 
search radius (dsr) for that row.  Note that dsr can have values as large as 40 
degrees.
2. Using psql run a query of the form
  select count(*) 
     from pos_big 
     where q3c_join(99., 18., ra, dec, dsr)
3. Get an error message:
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or after processing the request.

The log includes the following:
LOG:  server process (PID 19413) was terminated by signal 11: Segmentation fault
DETAIL:  Failed process was running: select count(*) from master_table.pos_big
         where q3c_join(  99,  18, ra , dec , dsr) ;
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and exit, because another server process exited abnormally 
and possibly corrupted shared memory.
--------------------
What is the expected output? What do you see instead?

If I run this query with 0.1*dsr rather than above it works fine.  As I 
increase the coefficient the query seems to crash when I'm getting about 7,000 
hits.

The count() doesn't make any difference.  A
   select * ...
fails too.

----------------------
What version of the product are you using? On what operating system?

Postgres 9.2.1
Q3C 1.4.13
Linux 2.6.18-308-13.1.el5PAE

--------------------------
Please provide any additional information below.

The q3c_radial_query function does work here, but it is a factor of 30 or more 
slower than q3c_join and not faster than more standard approaches.  In 
particular q3c_join is much,much faster when we do a cone search of our 
pos_small table (which has ~33 million rows).  The query takes less than a 
second so long as we put the constant values first and the row values second in 
the q3c_join function.  Q3c_radial_query takes about 30 seconds when querying 
our 33 million row table, about the same as a query using just a simple dec 
based index.

A workaround may be to use q3c_join for the pos_small table and not use q3c for 
the pos_big table, but this is inelegant, and it may simply be luck that we 
haven't seen the same problem in queries of pos_small (and indeed I just 
confirmed that it is possible to
set up queries that will fail there too if I specify a large search radius.  
The behavior suggests the problems is with the maximum radius, not the number 
of rows returned).

Original issue reported on code.google.com by taq...@gmail.com on 9 Oct 2012 at 6:50

GoogleCodeExporter commented 9 years ago
I should have noted that we created an index on q3c_ang2ipix(ra,dec) for the 
tables after the table was created and then clustered on that index.

Original comment by taq...@gmail.com on 9 Oct 2012 at 6:53

GoogleCodeExporter commented 9 years ago
Hi,

Thanks for the report.

I think I know what's the issue.

The main problem here is that handling of large angles is indeed a bit
problematic in q3c.  q3c_radial_query for example does switch to the whole
table scan in the case of very large angles. And q3c_join actually may not
have enough protection against big angles at all.

Another problem is that if you have the table with ra,dec and uncertainties
(or search radius), and uncertainty can be very large, then you pretty much
have to sequentially scan the whole table in order to find out whether the
uncertainty circle covers a given point or not (unless the index also
includes the uncertainty).

So, I'll take a look at q3c_join to prevent the crash.
In the meantime you can just use q3c_dist(99,18,ra,dec)<sr instead -- it
will be the exactly the same (because in the way you are using q3c_join, the
index won't be used anyway).
Could you also show me how did you use q3c_radial_query ?

Also, one possible recommendation for dealing with tables with large 
uncertainies
would be (assuming that the fraction of objects with really big uncertainties is
small) is the following:

## Create the test dataset  with uncertainties from 0 to 40deg with 10%
## >10deg
CREATE  TABLE testbig as select  random()*360 as ra ,random()*180-90 as
    dec,random()**2*40 as sr  from generate_series(0,33000000);
CREATE  TABLE testbig_sorted as select * from testbig order by
    (sr<10)::int,q3c_ang2ipix(ra,dec);
## sort the data in such a way that large uncertainty objects are clustered

## Create the 33mil test dataset  with uncertainties from 0 to 40deg,  10% of 
them 
## >10deg
CREATE  TABLE testbig as select  random()*360 as ra ,random()*180-90 as
    dec,random()**2*40 as sr  from generate_series(0,33000000);
CREATE  TABLE testbig_sorted as select * from testbig order by
    (sr<10)::int,q3c_ang2ipix(ra,dec);
## sort the data in such a way that large uncertainty objects are clustered

## create indices
create index on testbig_sorted(sr);
create index on testbig_sorted(q3c_ang2ipix(ra,dec));
# select objects covering a given point ra=180 dec=30
select * from testbig_sorted where
    (q3c_join(180, 30, ra, dec, 10) and sr< 10 and q3c_dist(ra, dec, 180, 30) < sr ) OR 
    ( sr > 10 and q3c_dist(ra, dec, 180, 30) < sr);
# it takes <~4 seconds on my machine.

The idea is that you group objects which have large uncertainty -- those you
better check one by one, and for the rest do the normal cone search. I used 
10degrees as a cutoff, but it could be different depending on the distribution 
of uncertainties.

    Sergey                       

Original comment by koposov on 9 Oct 2012 at 8:14

GoogleCodeExporter commented 9 years ago
https://code.google.com/p/q3c/source/detail?r=88a448a1a9bd27cddc8172f20671ecae08
ebf3c6

I've committed the q3c_join() fix. But I didn't have time yet to check it. 

Original comment by koposov on 9 Oct 2012 at 8:38

GoogleCodeExporter commented 9 years ago
Okay,  the problem was deeper than I expected, but i've hopefully fixed it in 
the following commit:
http://code.google.com/p/q3c/source/detail?r=e42285cdc04f96b73cee635fdd0160a36ee
4c7cf
I'll probably release the 1.4.14 version soon.

Regarding the queries in my comment #2 (I mistyped, I meant random()^10*40 
instead of random()**2*40 )

Original comment by koposov on 10 Oct 2012 at 3:21

GoogleCodeExporter commented 9 years ago
Hi Sergey,

Thanks for the quick response.   We'd already adopted the approach you
suggested below and separated our  data into two tables pos_big and
pos_small where the size/errors for pos_big were > 1 degree while pos_small
had all of the rows with size/errors <= 1 degree.  Only about  1% of the
data has large errors, so pos_big is (contrary to its name!) a much smaller
table than pos_small.   As you point out, regardless of the approach used,
we'll need to essentially do a full table scan of the pos_big table, but it
is small enough that that is tolerable.  While we could put all of these
into a single table (as you do below) and use multiple indexes on the table
when we were looking at doing this using other schemes it seemed easiest to
split it into distinct tables.

Operationally it would be convenient if I could use the same syntax in
querying both tables (i.e., have a standard syntax for doing cone searches)
and I'd certainly be a bit leery of using a library that can crash for a
perfectly valid (if not especially efficient) query.  I don't mind if q3c
is inefficient in doing this large angle query -- just that it doesn't
fail. We also allow users to enter SQL directly, so I'd also prefer not to
have a known vector for malicious users to easily crash our system.

If you'd like to see what we're doing, I've got a little blog post up at

http://skyview.gsfc.nasa.gov/xaminblog/index.php/2012/10/01/positional-matching-
in-postgres/
that describes what I'm doing and various approaches we've looked at.  When
I first started with Q3C I did both correlations and cone searches using
q3c_join, where I found that cone searches were efficient when I used
   q3c_join(187,2,ra,dec,1)
but slow with
   q3c_join(ra,dec,187,2,1)
I then reread the documentation and saw that you recommended
q3c_radial_query for cone searches.  and tried it, but it was much slower
than q3c_join.  I tried it in both orientations
  q3c_radial_query(ra,dec,187,2,1) and q3c_radial_query(187,2,ra,dec,1)
even though I think the documentation strongly suggests the first.  However
I didn't investigate this too thoroughly since q3c_join worked just fine
(until I ran into the bug).

A typical cone search query that we're trying looks like:

    select table_name,count(*) from pos_small
      where q3c_join(187,2,ra,dec,dsr)='t' and
               q3c_join(187,2,ra,dec,1)='t'
      group by table_name
   union
     select table_name,count(*) from pos_big
      where q3c_join(187,2,ra,dec,dsr)='t'
      group by table_name
    order by 1;

The ='t' is superfluous but is needed by our SQL security checker scan
which doesn't know that functions can have logical values.  It doesn't seem
to affect indexing (or I'd dig into our JSQLParser implementation to add
logical functions).

I've tried putting the ra,dec as the first arguments in q3c_join and that
is much slower.
Using q3c_radial_query instead of q3c_join seems to be much slower in both
orientations.

The second q3c_join in the first element of the union matches what you had
suggested below.  It doesn't look like the Q3C uses the index efficiently
if the search radius is a field.  This seems common to all of the indexing
approaches we tried (see the blog article).

In any case thanks for responding so quickly.  I'll put in your patches
today and try things out and let you know if it works.

   Tom McGlynn

Original comment by taq...@gmail.com on 10 Oct 2012 at 12:01

GoogleCodeExporter commented 9 years ago
Hi Tom,

First, regarding the order of arguments issue, It is very important for the
index usage indeed, so for example if instead of
q3c_radial_query(ra,dec,180,30,4) you'll write
q3c_radial_query(180,30,ra,dec,4) that'll guarantee that the index won't be
used. I thought that the documentation was clear enough about that, but
apparently it wasn't.
The same applies to q3c_join.

Regarding the slowiness of q3c_radial_query vs q3c_join, it is to some
degree a known issue and is by design, but maybe it's time to revisit.
Because q3c_radial_query is optimized for the large catalogues, where you
really want to do the smallest amount of I/O possible. So it is much more
CPU heavy, but it is very I/O efficient: it really tries to approximate the
query circle with the quad-tree squares and retrieve from the disk the
smallest number of blocks. So if you'll try to compare q3c_radial_query
vs q3c_join for very large catalogs (e.g. SDSS like or 2mass) q3c_radial_query
will in general outperform q3c_join. But when all the data small and/or is
fully cached q3c_join will win due to smaller CPU overhead.

Regarding the stability of q3c, I have been using it actively for ~ 7 years
and I know a lot of other people does. I have probably tens of billions of
sources in total indexed and queried by q3c. So I'm pretty sure that it is very
stable in the standard use case of not too large search radii <~ 20-30deg.
(that what the regime q3c was designed for and used). Regarding the very
large queries, I'm pretty sure that cone and ellipse searches with very big
radii should be fine as well. Very large polygon queries are disabled (e.g
with diameters >~ 40-50 degrees). Regarding the crossmatches with very large
radii, I've put protection against very large radii just now, so I think it
should be fine as well, but obviously I can't be 100% sure that it's rock
stable. I also know that polygon queries are used much less than other
kind of queries, so the likelihood of bugs in that part of the codebase is 
higher.

Thanks for the blog post. It is very helpful. I may write some more
comments later.  I can say that I'm using PGsphere sometimes when I have to 
work with
image/survey footprints and their overlaps, or when I have to find out which
images cover a given point on the sky. In those cases PGSphere is definitely
very good (and especially in those cases usually you don't have large number
of objects/images, so the slowness of R-trees doensn't manifest itself.

PS I'll be happy to discuss anything related to q3c or in general PG spatial 
queries by email koposov@ast.cam.ac.uk (it's a bit of pain to use 
code.google.com form for that), because I'm very interested in the subject and 
happy to hear and fix any possible deficiencies of q3c.

Original comment by koposov on 10 Oct 2012 at 2:43

GoogleCodeExporter commented 9 years ago
Hi Tom, 

I've just released the new version of q3c, which should perform much better 
with q3c_radial_query() queries, so I would suggest to try it instead of 
q3c_join, if you have time. 
It also includes some bug fixes with regards to polygon queries, so it's 
probably a good idea to update anyway.

Regards,
    Sergey 

Original comment by koposov on 17 Dec 2012 at 6:12

GoogleCodeExporter commented 9 years ago
Don't use this account much so I just saw this.  Thanks for the update and
we've grabbed the lastest version.
    Tom

Original comment by taq...@gmail.com on 9 Feb 2013 at 5:15