mrphlip / lrrbot

LoadingReadyRun Twitch chatbot
https://lrrbot.com/
Apache License 2.0
30 stars 20 forks source link

Make quote context searchable #210

Open mrphlip opened 8 years ago

mrphlip commented 8 years ago

As suggested in-chat by dialMforMara, the context for quotes should be searchable.

It probably should just be treated the same as the actual quote body. Fulltext-index it, and make the normal !findquote command (and the "Search quotes" function of the website) search where either quote or context contains the search term.

(I think making it so !findquote foo bar can find a quote where quote contains "foo" and context contains "bar" would be a bit overcomplicated, so I wouldn't be worried about that. Just (pseudocode) quote contains 'foo bar' or context contains 'foo bar'.)

((Unless it's possible to make a single full-text index over both quote and context? I don't know enough postgresql to know if that's a thing...))

andreasots commented 8 years ago

Index over quote || context should work.

danieljcrabtree commented 7 years ago

If you want to make context searchable, does it make sense to include attrib_name as well? It should be possible to index over quote || attrib_name || context.

I realise that !quote and the website already have ways to search on attribution but this way a query like 'paul fine' could return something like 'Everything Is Fine! - Paul'.

mrphlip commented 7 years ago

Hmm... I know it's certainly been the case that people tend to get !quote and !findquote confused, and use one when they mean the other... we could fold the two together and it'd save a fair amount of confusion.

It'd mean we'd lose the ability to find quotes that are by eg Paul rather than about Paul or vice versa, though, and... maybe that's worth it?

Also relevant that the search on attrib_name is a simple "contains" query, not a word search... which is relevant since it lets people search for eg "Cam" and find quotes attributed to "Cameron". But maybe we can still handle that... it looks like postgres supports making custom synonym dictionaries... I don't really understand the details, but looks like that would let us put in alternate forms of people's names so the search can still find them.

RebelliousUno commented 7 years ago

Would it not make some sense to have some switches on findquote !findquote by (author) !findquote about (quote body) !findquote why (context)

The last one feels like its not the right switch for contextual quotes.

andreasots commented 7 years ago

Oh boy, syntax bikeshedding. But first, a request from the chat:

11:45 Briars_the_fox: is there a way to implement key word AND person as a search option for the quotes? 11:46 Briars_the_fox: so if i wanted to look for alex AND butts 11:46 qrpth: Yes. There is a way to implement quote search so that you can look at Alex's butt.

I wrote this thing a long time ago so it needs some work before it can be added to LRRbot. The syntax:

query ::= disjunction
disjunction ::= conjunction '|' disjunction
conjunction ::= expr conjunction
expr ::= '(' disjunction ')' | atom
atom ::= quoted-string | token | token op (quoted-string | token)
quoted-string ::= '"' (<any character not '"'>)* '"'
token ::= (<any character not a whitespace, '(', ')', '|', '=', '>', '<' or ':'>)+
op ::= ':' | '=' | '>=' | '>' | '<=' | '<' 

This gist seems to be the parser code. Context, game and show tags need to be added and it should generate a SQLAlchemy query and not a SQL string.

Examples:

An alternative to this would be the !addquote syntax ((NAME) [DATE] QUOTE | CONTEXT). The advantages being that it's somewhat more familiar and simpler to describe. The disadvantages being that the queries are very limited and simple (but maybe it's fine?), being very strict on ordering of components and not being able to filter by game or show.

danieljcrabtree commented 7 years ago

@rebelliousuno, did you imagine being able to chain switches together?

e.g. !findquote by (author) about (quote body)

Or did you see them working in the same way as the game and show switches on the !quote command?

@andreasots, I don't have a lot of experience with context-free grammars so pardon me if I'm just confused. There doesn't seem to be a way to terminate these rules:

disjunction ::= conjunction '|' disjunction
conjunction ::= expr conjunction
andreasots commented 7 years ago

I don't have a lot of experience with context-free grammars so pardon me if I'm just confused. There doesn't seem to be a way to terminate these rules.

You are correct. It should be


disjunction ::= conjunction '|' disjunction | conjunction
conjunction ::= expr conjunction | expr
danieljcrabtree commented 7 years ago

To go back for just a moment: was the original request to search on just the context column or to search on context and other columns? If it’s as simple as searching on just context, then !quote context <query> or !findquote context <query> might suffice.

Combining !quote and !findquote would seem to make sense, especially as there are 4 quote related commands with subtly different syntaxes. If searching on multiple columns is required then Andreas’ syntax and parser look good. But, as Andreas suggests, it’s harder to explain. The need for double-quoted strings strikes me as something that could easily catch people out.

But is there much demand for searching on more than one column? If there isn’t, I’d be inclined not to risk complicating things. If the aim is just to combine !quote and !findquote into a single command and search on context, then how about something like this?

New command Current command
!quote !quote
!quote id <int> !quote <int>
!quote <int> (alias for !quote id <int>) !quote <int>
!quote name <query> !quote <query>
!quote <query> (alias for !quote name <query>) !quote <query>
!quote quote <query> !findquote <query>
!quote context <query>
!quote game <query> !quote game <query>
!quote show <query> !quote show <query>

!findquote would be deprecated.

This is similar what Uno seems to be suggesting and could be updated to use Andreas’ syntax at a later stage.

RebelliousUno commented 7 years ago

The two aliases for quote id and quote name could potentially add extra logic that might be a little hassle. Just in case a name ended up being confused with an id. Personally I'd drop one of the aliases (likely quote name alias)