pat / thinking-sphinx

Sphinx/Manticore plugin for ActiveRecord/Rails
http://freelancing-gods.com/thinking-sphinx
MIT License
1.63k stars 469 forks source link

Delta indexes sql_joined_field missing where table.delta = true #1262

Closed bkd closed 1 month ago

bkd commented 2 months ago

We have an index

__I am using apartment gem and modified thinking_sphinx/configuration.rb (sphinxmultiplier is merged again at config generation as this was causing issues of not hitting the right index multiplier - not the issue here)

indexes 'SELECT people.id sphinx_multiplier AS sid, positions.job_title AS person_position_job_title FROM people INNER JOIN positions ON positions.person_id = people.id GROUP BY sid, job_title ORDER BY sid', as: :person_position_job_title, source: :query

produces for core and delta

sql_joined_field = person_position_job_title from query; SELECT people.id * 66 + 2  AS sid, positions.job_title AS person_position_job_title FROM people  INNER JOIN positions ON positions.person_id = people.id GROUP BY sid, job_title ORDER BY sid

where as this should be for the _delta index definition:

sql_joined_field = person_position_job_title from query; SELECT people.id * 66 + 2  AS sid, positions.job_title AS person_position_job_title FROM people  INNER JOIN positions ON positions.person_id = people.id  **WHERE people.delta = true** GROUP BY sid, job_title ORDER BY sid

When running delta indexes it was taking forever as it was not having the WHERE people.delta = true clause added

fiddling around and not covering everything (I could use Arel also...not my bag though) - this worked for me

  def build_sql_fields
    fields.each do |field|
      @sql_field_string        << field.name if field.with_attribute?
      @sql_field_str2wordcount << field.name if field.wordcount?
      @sql_file_field          << field.name if field.file?

      if field.source_type
        property_query = PropertyQuery.new(field, self).to_s
        property_query = apply_delta_condition(property_query) if delta?
        @sql_joined_field << property_query
      end

    end
  end

  ### HACK
  def apply_delta_condition(sql_query)
    delta_condition = " #{model.quoted_table_name}.delta = TRUE"
    has_group_by = sql_query =~ /GROUP BY/i
    has_order_by = sql_query =~ /ORDER BY/i
    has_where = sql_query =~ /WHERE/i
    where_join_type = has_where ? 'AND' : 'WHERE'
    if has_group_by || has_order_by
      parts = has_group_by ? sql_query.split(/GROUP BY/i, 2) : sql_query.split(/ORDER BY/i, 2)
      sql_query = "#{parts[0]} #{where_join_type} #{delta_condition} #{has_group_by ? 'GROUP BY' : 'ORDER BY'} #{parts[1]}"
    else
      sql_query += " #{where_join_type} #{delta_condition}"
    end
    sql_query
  end

Now my delta indexing is less than 1 second again - coming down from 5 minutes a time....

I dont have LIMIT or HAVING in my sql_joined_fields - so this above does not consider this - so for my situation it works - but it will break otherwise with either of the aforementioned clauses

pat commented 1 month ago

Thanks for reporting this @bkd.

I'm not quite sure how to best approach this though, given the field is based on a custom SQL string - adding in the logic to reliably parse SQL into Thinking Sphinx feels like a significant jump in complexity for what my gut suggests is an edge-case. Not that your case isn't valid, just that I don't think it's common.

If yourself or others can come up with a solution that doesn't involve the SQL parsing/manipulation, I'd love to hear about it! But at least for now, I'm to close this issue given I'm not going to do anything about it myself at this point in time.