Open ryanjgallagher opened 3 years ago
One thing to keep in mind: a tweet can end up in multiple quote levels. For example, say a search returns a quote of a tweet where both the original tweet and the quote match the given query. Then the quote tweet is quote_level
0 because it's in the original search. However, when getting quotes (quote_level
1), that quote tweet will be returned because it quotes the original tweet (which was a query match itself). Further though because the quote tweet is also quote_level
0, any tweets that quote the quote tweet will be returned. This will lead to any inefficiency because the quote tweet becomes quote_level
1 along with all of the tweets that quoted it, even though they should be one level up. If this isn't handled, it can defeat the purpose of adding a quote_level
column for efficiency.
At the time of doing the search, you can't efficiently check if a quoted tweet has already been seen; similarly during a stream you don't know if the quoted or quoting tweet is the match. So before you start a quote search (even just a regular one), you should first update all of the quote levels of any quote tweets if the tweet they're quoting also came directly from the search or stream.
It's possible you might run into similar issues based on how people do circular replies and quotes. So think about if that would affect any circular / infinite queries that could happen.
Can get rid of get_quotes_of_quotes
and just let user specify the quote_level
. Default to quote_level
1
Currently, you can get quotes of quote tweets, but there's no efficient way to continue iterating that process because everything gets labeled as
from_quote_search
. Two changes can be made:quote_level
column to the database, so that you can subset by quote tweets which iteration of a quote search they were returned from. For example, tweets retrieved from a search arequote_level
0. Quotes of those tweets arequote_level
1. Quotes ofquote_level
1 tweets arequote_level
2. And so on. This involves updatingconfig.py
and the insertions into the tweets database insearch.py
andhelper.py
up_to_quote_level=6
and the search automatically gets quotes from levels 1 to 6 automatically