Closed lemon24 closed 3 years ago
There are multiple changes that can be done, with different levels of impact to current users. This is kinda covered in the first comment, but it's worth discussing them individually.
make_reader(..., search_enabled: Optional[bool] = None)
, where "search_enabled: Whether to enable or disable search. None means do nothing."
reader[search]
.update_feeds() calls update_search() automatically.
Always install the reader[search]
extra.
Regardless of the default, make_reader(..., search_enabled=True)
must fail fast for unsupported versions of SQLite (<3.18). It should also fail if the dependencies were not installed – if I'm requesting search explicitly, I don't want update_feed() to fail.
Open questions:
make_reader(..., search_enabled=True)
, or whenever is_search_enabled() is True
?search='none:'
make_reader() argument. Presumably, it could help make a reader that isn't disabling the search, but isn't doing automatic update_search() calls either.
make_reader(..., search_enabled=True)
, search_enabled=None
achieves the same thing.update_feed...()
calls?
make_reader(..., search_enabled=True)
was used.I'm getting a feeling I may be solving the wrong problem.
Initially, we didn't know when updating search should happen, and the search API wasn't (isn't) stable, so we just exposed everything. It seems we've now converged to "updating search happens offline" (like updating feeds). Also, the pluggable search design (#168) came after we added enable/disable_search() (#122), but because I didn't actually implement it, I didn't notice the overlap.
So maybe we shouldn't burden the user with details about that – most of the time, people just care about search_entries(). If we expose the Search object directly, we don't need to expose enable/disable_search(), is_search_enabled(), and update_search().
Here's a proposal for a new search API:
def make_reader(url, search=None):
# TBD if it's none:/default: or :none:/:default:
if search is None or search == 'none:'
search_obj = None
elif search == 'default:':
search_obj = Search(...)
else:
raise ValueError("bad search")
if search_obj:
search_obj.check_dependencies()
return Reader(..., search_obj)
# Search remains as-is
class Reader:
search: Search
def _update_feeds(self, feed, updates_enabled, search):
results = ...
for result in results:
# also runs plugin hooks
rv = self._update_feed(feed)
# not ideal, since we do many small updates instead of a big one;
# but if we move it after the loop, search needs to be able to do the same kind of filtering
if search and self.search:
try:
self.search.update(feed)
except SearchNotEnabledError:
self.search.enable()
self.search.update(feed)
yield rv
def update_feeds_iter(self, feed=None, updates_enabled=True, search=True):
yield from self._update_feeds(feed, updates_enabled, search=search)
# maybe update search for disabled feeds too here,
# if feed is None and updates_enabled was not provided explicitly;
# also not ideal, since we get a big pause at the end of the iterator
def update_feeds(self, ...):
for _ in self.update_feeds_iter(...): pass
def update_feed(self, feed, search=True):
return zero_or_one(self._update_feeds(feed=feed, search=search)
def search_entries(self, ...):
if not self.search:
raise NoSearchAvailableError
try:
yield from self.search.search_entries(...)
except SearchNotEnabledError:
# even if we enabled search here,
# we'd still get no result until an update()
pass
# removed: enable/disable_search(), is_search_enabled(), update_search()
# normal usage
reader = make_reader("db.sqlite", search="default:")
reader.update_feeds()
reader.search_entries(...)
# no search
reader = make_reader("db.sqlite", search="none:")
reader.update_feeds()
reader.search_entries(...) # raises NoSearchAvailableError
# enable/update/disable search explicitly
reader = make_reader("db.sqlite", search="default:")
reader.search.enable()
reader.search.update()
reader.search.is_enabled()
reader.search.disable()
Update: I don't like this. On the surface it looks OK, but it makes things harder to implement correctly, OR still requires explaining why things behave different than expected. Better to have the primitives exposed, and let users mix and match.
Part of my unease with the search API has to do with its statefulness – some methods may or may not work depending on whether search is enabled, or if there is a search in the first place.
However, besides being connected to a specific search backend, part of this statefulness comes from outside a specific Reader object: e.g. one Reader expects search to be enabled, but another one disables it after the first was instantiated.
At least part of it is essential complexity (us doing migrations automatically in the Storage constructor is trying to hide that, but long-term it makes sense to have that as a separate, explicit operation, like we're doing with search). I don't know what the "correct" API to present this is, but it seems pretty hard/premature to get it right now, especially without other search/storage implementations. We should only solve the problem at hand.
Avoid having to explain search to users.
No one complained of this; it seems like an imagined problem.
Reduce boilerplate by doing enable_search(), update_search() automatically.
Same as above. Also, the granularity is useful; we can't get rid of it completely.
update_search(), search_entries() may raise SeachNotEnabledError.
Can't be solved entirely, but enabling it in make_reader() can help – at least it makes the issue more visible.
update_search() may fail late due to missing dependencies.
Enabling search in make_reader() and checking for dependencies there seems like an acceptable solution.
Always install the reader[search]
extra.
No. The granularity is good.
def make_reader(..., search_enabled=None)
Maybe.
Seems like a good idea, at least until we think of a better way of saying "give me a Reader with working search".
With search_enabled true, we can check for missing dependencies and fail early.
def make_reader(..., search_enabled=True)
(default true)No.
Requires having reader[search]
.
Also, for other search implementations, this may not be desirable (e.g. too slow).
We may consider enabling search by default if an overwhelming majority of users use search (I have no data for this).
FWIW, for SQLite, enable_search() takes ~50ms for 12k entries, which seems acceptable (if search is already enabled, it's much less). I don't know how other methods are impacted by search being enabled.
update_feeds() calls update_search() automatically.
No. Hard to implement right, breaks expectations.
The table below shows timings on how search affects other methods (for my database with ~160 feeds and 13K entries).
Some notes:
entries_search_sync_state
. These must be cleaned up by calling _delete_from_search() regularly (likely in update_feeds()). cold | hot | |
---|---|---|
make_reader(), migration | 24 ms | |
make_reader(), no migration | 1.4 ms | |
enable_search(), not enabled | 34 ms | |
enable_search(), enabled | 0.2 ms | |
is_search_enabled() | 0.2 ms | 0.2 ms |
set_feed_user_title(), not enabled | 60 ms | 16 ms |
set_feed_user_title(), enabled | 140 ms | 16 ms |
set_feed_user_title(), enabled, updated | 150 ms | 16 ms |
delete_feed(), not enabled | 1.2 s | |
delete_feed(), enabled | 1.7 s | |
delete_feed(), enabled, updated | 1.7 s | |
_delete_from_search(), enabled | 1.7 ms | |
_delete_from_search(), enabled, updated | 2.2 ms | |
_delete_from_search(), enabled, feeds deleted | 163 ms | |
_delete_from_search(), enabled, updated, feeds deleted | 4.7 s |
Always install the reader[search]
extra.
Maybe. Currently, this adds only beautifulsoup4
and soupsieve
, which seems acceptable.
def make_reader(..., search_enabled=None)
Yes.
Seems like a good idea, at least until we think of a better way of saying "give me a Reader with working search".
With search_enabled true, we can check for missing dependencies and fail early.
def make_reader(..., search_enabled=True)
(default true)Maybe.
Requires having reader[search]
.
While for other search implementations this may not be desirable (e.g. too slow), we don't have any planned at the moment. It's better to be friendly to the users now.
This affects the other methods in acceptable ways speed-wise. If the user never calls update_search(), disk usage is only ~2% higher.
_delete_from_search() needs to be called regularly.
def make_reader(..., search_enabled=auto)
(enable search on first use)Maybe.
Same as above, but enable search on the first update_search() call.
Zero-cost until enabled. Doesn't require _delete_from_search() to be called regularly (the user obviously knows about update_search()).
TODO: We may need a better argument name.
update_feeds() calls update_search() automatically.
No. Hard to implement right, breaks expectations.
To do:
Spent 14h on this, 8 of them thinking; that's arguably too much...
Enable search by default, because there's less explaining to do for new users: no explicit
pip install reader[search]
, enable_search(), or update_search() needed.Thoughts:
We needmake_reader(..., search='default:')
(default) andmake_reader(..., search='none:')
, per https://github.com/lemon24/reader/issues/168#issuecomment-642002049.reader[search]
dependencies required (keep the extra for backwards compatibility, though).none:
and enabled.update_feed(..., search=False)
; we likely needupdate_search(feed=...)
for this.