pat / thinking-sphinx

Sphinx/Manticore plugin for ActiveRecord/Rails
http://freelancing-gods.com/thinking-sphinx
MIT License
1.63k stars 468 forks source link

PGSQL timestamps are rounded but range filters always floor #361

Closed henrik closed 12 years ago

henrik commented 12 years ago

This is a bit of a lazy bug report. Haven't tried on latest TS (we're on 3fcecd4ab738c6a52717fcce89a6ad363f6c579e) and I don't have the time to try to submit a patch right now.

Still, it's a sneaky bug, so I wanted to write it down.

Postgres has float timestamps with sub-second precision. If I index timestamp 123.7 in Sphinx, it seems it's indexed rounded, as 124.

But if I then filter by a range of timestamps starting one second later, like 124.7..999, the item I just indexed will unexpectedly be included. Seems like the filter will round down (floor), searching for 124..999.

Took me a while to figure out why my tests would fail intermittently. Turned out they failed with timestamps that would round up.

pat commented 12 years ago

The SQL involved is cast(extract(epoch from table.column) as int) - I guess if we could switch that cast out for a rounding down to the floor instead it'd do the trick?

henrik commented 12 years ago

Yeah, that should work:

SELECT FLOOR(EXTRACT(epoch FROM now())), CAST(EXTRACT(epoch FROM NOW()) AS int), EXTRACT(epoch FROM now());
   floor    | date_part  |    date_part     
------------+------------+------------------
 1339149503 | 1339149504 | 1339149503.59423
henrik commented 12 years ago

And I would guess that always flooring, as you proposed, is probably more accurate than always rounding, which would be another option. I expect DB engines without sub-second precision effectively floor the value.

pat commented 12 years ago

Well, this is just for Thinking Sphinx's PostgreSQL adapter - I'm not sure if MySQL even stores timestamps down to millisecond level.

henrik commented 12 years ago

Yeah, I meant that if you don't have millisecond precision, you are probably effectively flooring. I expect MySQL will say it's 123 o'clock when PG would say it's 123.999 o'clock.

pat commented 12 years ago

Okay, patched. I kept the cast call wrapping the floor call, as floor returns the same datatype that it's given (and besides, sometimes I want to return ints, sometimes bigints).

henrik commented 12 years ago

@freelancing-god Awesome, thank you!