Am I interacting with this feed?

lemon24 commented 3 years ago

I'd like to know if I'm interacting with a feed, i.e.:

when is the last time I read an entry?
am I still reading new issues as they are published?
(maybe) how did that change over time?

Open issues:

How do we answer the questions above?
There are two kinds of "read": "read", and "not read, but I don't care". How do we distinguish between the two? Does having both change the result?
How do we handle entries from before we started recording read timestamps?
Duplicates should be excluded. #140 will take care of this, but entry_dedupe needs to be able to copy/change any timestamps we store.

lemon24 commented 3 years ago

Rough order of things, based on what blocks what:

[x] decide how to store flag change timestamps
[x] find a way to model "not read, but I don't care" (we could skip this, but it seems like the right time to think about it)
[x] store timestamps; we can expose them now, or after we collect data
[ ] collect data (1-3 months)
[x] implement #140 (we do not want to think about duplicates in the logic for this issue)
[ ] having data, try and answer the questions in the web app
[ ] fit answers to the questions into the existing API

lemon24 commented 3 years ago

I don't know what other combinations mean, but this is enough to show "not read, but I don't care" == read and not important and important_changed is not None.

Using a new flag or an entry tag makes things more complicated, and there's no other meaning to associate with "marked as not important by hand, not by default" anyway (I think).

lemon24 commented 3 years ago

Regarding how we store timestamps:

We could do it with entry metadata (#253), but that's not implemented yet, and using it in queries would be a pain. It makes more sense to do the simplest thing possible now, and reconsider re-unification later.

The "simplest possible thing" seems to be to have another Entry attribute (and table column), so we'll do that.

lemon24 commented 3 years ago

The minimal API:

# Reader
mark_as_read(...)  # use the real "now", backwards compatible
mark_as_read(..., now: Optional[datetime])  # use a custom "now"; mainly for plug-ins
# TODO: find a better argument name than now... timestamp? last_modified?

class Entry:
    ...
    read_last_modified: Optional[datetime]

# Storage
mark_as_read_unread(..., now: Optional[datetime])

# idem for important

Notes:

read_last_modified is Optional because new entries or entries that predate this won't have a last modified.
- Because of this, entry_dedupe need to be able to set it as None; maybe users do too.
It would be nice if read is True implied read_last_modified is not None. However, I don't know what to use for entries that predate this (something between entry first_updated_epoch and now); it's easier (and likely more correct) to leave it be None, and coalesce() it during query on a case-by-case basis.
We could delay user-facing changes (the Entry attributes, and mark_as_read(..., now=...)), but we need to test timestamps are actually set anyway, and it's best to do so through the same API the user will use.

lemon24 commented 3 years ago

To do (for storing stuff):

[x] migration
[x] Entry attribute (will break some tests)
[x] set attribute in markas...
[x] expose modified argument in markas...
- maybe YAGNI, plugins can use Storage directly
- postpone at least until #256 (?)
- also, exposing means we have to think about the kind of datetime we receive (naive/aware), and I don't want to do that now
[x] read attribute on get_entries() (will break some tests)
[x] round-trip tests
[x] make entry_dedupe copy modified
[x] facility for "not read, but I don't care" (either in the API or in the web app only, TBD)
[x] changelog
[x] docstrings
[x] user guide

lemon24 commented 3 years ago

We're mostly done with the "store modified" part.

The UX for "don't care" needs a bit of work, though... In 4278939 (and 59004b3) I had to make the "unimportant"/"unread" buttons button set important_modified=None, because otherwise making a read, important entry not important makes it "don't care", which is not what we want (not all the time, anyway – sometimes you just want to undo a "mark as important").

... this tri-state of important (true, false and modified, false and not modified) is a bit confusing (or at least, the way we use it to infer another flag is) ... I should likely make a diagram to make more sense of it.

lemon24 commented 3 years ago

Spent about 12 hours on this until now.

lemon24 commented 1 year ago

We have conflicting interests, so it's likely time to address https://github.com/lemon24/reader/issues/254#issuecomment-938146589:

... this tri-state of important (true, false and modified, false and not modified) is a bit confusing (or at least, the way we use it to infer another flag is) ... I should likely make a diagram to make more sense of it.

Currently, "don't care" == read and not important and important_modified is not None.

Here are all the possible combinations and their meanings:

read	read_modified	important	important_modified	#	interpretation
unread	none	unimportant	none	1	initial state
unread	date	unimportant	none	1	read then unread
unread	*	unimportant	date	2	important then unimportant
read	date	unimportant	date	1	don't care
read	none	unimportant	date	1	don't care (unreachable from the web app)
read	date	unimportant	none	1	read
read	none	unimportant	none	1	read (pre-#254)
*	*	important	*	8	important

Importantly, the interpretation is done outside of Reader, in the web app; because of this, there are two places where the web app needs to do extra handling:

"mark as unread" for a "don't care" entry has to set important_modified to None; otherwise, "don't care" -> "mark as unread" -> "mark as read" would (likely unintentionally and unexpectedly) result in the entry being "don't care" again (note the user can make these transitions months apart)
"mark as unimportant" has to set important_modified to None too; otherwise, "mark as unimportant" on a read entry would set it as "don't care" (but then, how would the user undo marking an entry as important by accident?)

The web app allows the following transitions (source):

The conflicting interests are as follows:

"don't care" is expressed as read and unimportant with important_modified set.
The mark_as_read plugin should mark entries as "don't care" (it does so since 2.4).
For #294, we need to tell apart if it was a plugin or the user that set a specific status. Specifically, mark_as_read should not count as an interaction.
Pre-#254 (_modified) entries have _modified set to None.

On top of this, if the "don't care" logic implemented in the web app is useful, it should end up in Reader (we can probably repurpose the mark_entryas... methods for this, as part of #291). But, if it becomes stable, it should be easy to explain.

lemon24 commented 1 year ago

The current state as described above impossible to satisfy all the interests.

Adding a third flag for "don't care" would even complicate things further. Even worse, its meaning overlaps with unimportant with important_modified set. But...

We can translate the interests in terms of actual requirements:

User "don't care" is expressed unambiguously.
Plugin "don't care" is expressed unambiguously (i.e. is different from user "don't care").
(for #294) Only user-derived flags should have *_modified set.
~Pre-#254 entries have *_modified set to None.~ This is not a strong requirement.

Proposal 1: unimportant with modified == don't care

We can express user "don't care" as just unimportant with important_modified set. This is simple to explain ("user explicitly set unimportant"), and is not coupled to read in any way.

important	important_modified	interpretation
unimportant	none	never set
unimportant	date	don't care
important	*	important

For plugins, we can free up read-none by backfilling pre-#254 read to entry.added; for consistency, we should do the same for important.

This can also be explained more easily ("plugin explicitly set as read"), and is not coupled to important in any way.

(The unimportant-with-modified entries already marked by mark_as_read as user "don't care" are an acceptable loss.)

The full table becomes:

read	read_modified	important	important_modified	#	interpretation
unread	none	unimportant	none	1	initial state
unread	date	unimportant	none	1	read then unread
read	none	unimportant	none	1	plugin don't care
read	date	unimportant	none	1	read
read	none	unimportant	date	1	user don't care and plugin don't care
*	*	unimportant	date	3	user don't care
read	none	important	*	2	important and plugin don't care
*	*	important	*	6	important

The UI logic becomes (a bit) simpler too, the tri-state involves only important:

if not important:
    important_button()
if important or important_modified:
    unimportant_button()  # modified=none
if not important_modified or important:
    dont_care_button()  # modified=date

This doesn't really clear things up from a Reader perspective, though.

lemon24 commented 1 year ago

Proposal 2: important: bool|None

Change important to be bool|None. Users set modified, plugins don't.

Pros:

easy to explain and understand
models real world accurately (including user/plugin divide)
no modified hacks
backwards-compatible when using Entry.important
- as in, it may be not bool, but has the same semantics when used in an if statement

Cons:

need to migrate current don't care entries
backwards-incompatible Entry.important if someone checks the type
what should be the type of get_entries(important=...)? we've essentially encountered https://github.com/lemon24/reader/issues/177#issuecomment-674786498
- hard to make it backwards compatible (?)
- you have easy values for true, false, never set
- how about "all"?
- how about combos (true or false, true or not set, false or not set)?
- how about modified set or not? (arguably, this is not possible now either)
- propagate all the way up to web app, etc.

Backwards-compatible proposal for get_entries(important=...):

value	predicate	notes
True	important	works like before
False	not important	works like before
None	True	"all"; works like before
'unset'	important is None	needs better name?
'explicitly false'	important is False	"don't care"; needs better name!
'set'	important is not None	needs better name?
'not explicitly false'	important is not False	needs better name!

The last two aren't required, but it's still a good idea to find names for them.

Update: Here's a full typed proposal for the get_entries() API – instead of mixing bool|None and string literals, we have a set of literals, and map bools and None to some of the values (note BoolFilter is subset of OptionalBoolFilter):

BoolFilter = Literal[
    'true',  # value is True (equivalent to filter=True)
    'false',  # value is False (equivalent to filter=False)
    'all',  # equivalent to filter=None
    'default',  # equivalent to filter='all'
]

OptionalBoolFilter = Literal[
    'true',  # value is True (equivalent to filter=True)
    'false',  # value is False
    'unset',  # value is None
    'not true',  # equivalent to filter=False
    'not false',
    'all',  # equivalent to filter=None
    'default',  # equivalent to filter='all'
]

def get_entries(
    read: BoolFilter | bool | None = None,
    important: OptionalBoolFilter | bool | None = None,
) -> None: ...

Of course, we don't have to go with the full thing; initially, we can go with the minimum needed, and add more values later:

OptionalBoolFilter = Literal[
    'true',  # ~= True
    'false',  
    'unset',
    'not true',  # ~= False
    'all',  # ~= None
]

def get_entries(
    read: bool | None = None,
    important: OptionalBoolFilter | bool | None = None,
) -> None: ...

We could also model BoolFilter and OptionalBoolFilter as enums, but:

it wouldn't fit with other APIs where we use literals (e.g. sort)
important would have to be OptionalBoolFilter | BoolFilter | bool | None = None (an enum cannot subclass another enum)
- on one hand, it's more explicit: BoolFilter.FALSE is different from OptionalBoolFilter.FALSE (in behavior)
- this is confusing, maybe they shouldn't be different
- on the other hand, BoolFilter.TRUE is exactly the same as OptionalBoolFilter.TRUE (but we're suggesting they're not)

Update: EntryCounts should also likely be updated to count "important == None".

Update: The UI logic (note "unimportant" becomes "clear important" so it's harder to confuse with "don't care"):

if not important:             # False, None
    important_button()        # -> True
if important is not None:     # True, False
    clear_important_button()  # -> None
if important is not False:    # True, None
    dont_care_button()        # -> False

lemon24 commented 1 year ago

Going with proposal 2.

To do:

[x] schema update (do read as well, just in case)
[x] "don't care" migration
[x] set_entry_important(), mark_entryas...
[x] Entry.important
[x] mark_as_read plugin (no migration needed)
[x] entry_dedupe plugin
[x] get_entries(important=...) (also counts and search)
[ ] EntryCounts
[x] docs
- [x] API
- [x] guide
- [x] changelog
[x] web app
- [x] should "don't care" (unimportant) be hidden by default? no way to exclude them with the current search

lemon24 commented 1 year ago

After a lot of deliberation, here's a naming scheme for get_entries(important: bool|None|TristateFilter):

Entry.important	optional bool	enum	str
True	True	IS_TRUE	istrue
False		IS_FALSE	isfalse
None		NOT_SET	notset
False, None	False	NOT_TRUE	nottrue
True, None		NOT_FALSE	notfalse
True, False		IS_SET	isset
True, False, None	None	ANY	any
True, False, None	None	DEFAULT	default

Requirements:

Should express all filtering combinations.
Should be reasonably concise.
Should be explicit.
Should be hard to confuse with the old optional bool version.
- Specifically:
- get_entries(important=None) != get_entries(important='notset')
- get_entries(important=False) != get_entries(important='isfalse')
- I may end up deprecating optional bool argument (but it won't be removed soon).
- Or I might keep it, in a boolean context it still makes a lot of sense.
(ideally) Should be language-agnostic (so, no none or null, which are overloaded already).
(ideally) The string version should unambigously work in places like YAML.

Notes:

UNSET could be an alias for NOT_SET, but it's more consistent like this.
- "UN" as a prefix is problematic... UNTRUE works, but UNFALSE is weird.
SET could be an alias for IS_SET, but consistency again.
IS_NOT_TRUE is more consistent than NOT_TRUE, but it's too long.
ANY, not ALL:
- "all the entries where entry.important is true"
- "all the entries where entry.important is any" (clunky, but makes sense)
- "all the entries where entry.important is all" (?!)

lemon24 commented 1 year ago

A quick note on what additional stats should look like.

While EntryCounts.averages provides a decent answer for e.g. all the entries in a feed, it does not provide the same answer for all the read/important/etc. entries (not without calling get_entry_counts() multiple times).

Furthermore, there are additional questions averages cannot easily answer, e.g. read but not important, or unimportant but not set by the user (no modified).

For maximum flexibility, we could instead provide a dataframe-like collection containing the values of all interesting fields for each entry (published, updated, added, read, read_modified, ...).

lemon24 commented 1 year ago

https://gist.github.com/lemon24/93222ef4bc4a775092b56546a6e6cd0f

Feed scoring algorithm (and how I consume feeds)

This is an attempt to use the metrics added in lemon24/reader#254 "Am I interacting with this feed?" to determine a feed "usefulness" score based on how many entries I mark as read / important / don't care.

lemon24 / reader

Am I interacting with this feed? #254

Proposal 1: unimportant with modified == don't care

Proposal 2: important: bool|None