On a URL query, time-bucketing groups the list of annotators unhelpfully

judell commented 8 years ago

The question answered by the list in other contexts, e.g. a tag or user query, is: “Who else annotated this doc.” And the answer comes back as a complete list:

For the uri: query, though, the list may split across time buckets:

@ajpeddakotla this might want a tag like "design"

seanh commented 7 years ago

The way this is currently implemented the list of users in a document bucket is actually all the users who annotated this document in this time bucket. Whether on a URI search or any other search, this is not necessarily a list of everyone who annotated this document, as the same document may always appear in a different time bucket (either further down on the same page, or on another page of the paginated results) regardless of what the search query is, and one or more of those other time buckets that the document appears in may contain some users who aren't listed in the time bucket that you're currently looking at.

Also keep in mind that the annotations shown are only those annotations that matched the search query. So even in the case when all annotations of a document occurred in the same time bucket (so you don't have the problem of the document appearing in multiple time buckets), the list of users in the document bucket is not everyone who annotated that document, rather it's everyone who has an annotation of that document and the annotation matches the current search query.

(This is also true of the list of tags, and the count of annotations)

My opinion: there is a problem with bucketing annotations by document and timeframe that is particularly acute on URI queries since the query results, by definition, only contain one document under which all of the matching annotations will always be collapsed, so most document search results will look like this:

screenshot from 2016-12-06 15-19-02

Collapsing all the annotations under a single document bucket doesn't seem to achieve anything except making the user do a pointless click to see the search results.

And if there are more than one page's worth of annotations then the next page will look exactly the same, and the next one.

A simple solution might be: just expand document buckets by default on URI searches.

If the annotations of the document occurred in more than one timeframe then you can end up with multiple timeframes on the page, but each timeframe must by definition contain a single document bucket always for the same document:

screenshot from 2016-12-06 15-57-52

In this case when there are multiple timeframes arguably it is still useful to have the documents collapsed on page load? Because the collapsed view gives you a quick overview of the times in which this document was annotated. Although I see Dan arguing that a single document shouldn't appear in multiple timeframes

dwhly commented 7 years ago

There's two primary questions:

Should documents show up multiple times, in different buckets.
On single URL queries, should we expand by default.

For the first, I'm firmly in the "no" camp. The buckets were only ever a convenient guide to break what would otherwise be a monolithic list into some roughly helpful chunks to help people get a sense of when they were last annotated (since we don't print the timestamp anywhere that is visible)-- not to be a strict grouping of all documents that were annotated between this time and that time, and repeating them when they fell into multiple zones. (At some point, we probably should implement a search parameter and UX which allows folks to query between time X and time Y).

The current implementation needs work-- particularly for n < 7 days. Arguably we should skip the buckets altogether and just find a way to indicate the time / date of the 'most recent annotation' on the document line itself. But whatever we do would benefit from a smarter algorithm that says "On this page of results, based on the range of times present, bucket the documents in a helpful way". If the annotations were all made in the last 3 days then it might be "Today", "Yesterday", and "Two days ago". If the annotations were all made today it might be "In the last hour", "Two hours ago", "Six hours ago", etc. Or we could just skip buckets and print the timestamp of the last annotation next to the document title and sort them in that order.

Case in point, if you go to hypothes.is/search now it says Last 7 Days, when in fact all the documents there were annotated in the last 5 minutes.

For the second, @seanh is exactly right. Having a single document collapsed isn't helpful. There is a control that @conordelahunty identified called "Expand All" which hasn't yet been implemented. For single document result sets, we should just invoke that parameter by default and expand.

jeremydean commented 7 years ago

Just a quick note that I too am firmly in the "no" camp on one here.

seanh commented 7 years ago

@dwhly @jeremydean So lets say we change it so that each document only appears in one timeframe, and is never repeated in different timeframes. Then questions:

Should the annotations count, list of annotators, and list of tags in the top-right of document bucket be all the annotations, annotators and tags of that document? Or should it be only the ones that matched the search query? For example if you search for "foobar", it will only list the users who have an annotation of the document and that annotation contains the word "foobar".
What about the list of annotation cards themselves in the document bucket? Should it contain all annotations of that document? Or just those that matched the search query?

jeremydean commented 7 years ago

My vote in both cases would be "all." I think Dan said it better above but basically the time buckets are just a way to help know at a glance what docs were most recently annotated. It's really all about the docs themselves which are objects that exist across time. ᐧ

On Tue, Dec 6, 2016 at 10:10 AM, Sean Hammond notifications@github.com wrote:

@dwhly https://github.com/dwhly @jeremydean https://github.com/jeremydean So lets say we change it so that each document only appears in one timeframe, and is never repeated in different timeframes. Then questions:

1.

Should the annotations count, list of annotators, and list of tags in the top-right of document bucket be all the annotations, annotators and tags of that document? Or should it be only the ones that matched the search query? For example if you search for "foobar", it will only list the users who have an annotation of the document and that annotation contains the word "foobar". 2.

What about the list of annotation cards themselves in the document bucket? Should it contain all annotations of that document? Or just those that matched the search query?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hypothesis/product-backlog/issues/20#issuecomment-265191403, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHFzlcxa9J-uWuf_lZbRDzl5XfkXh3Gks5rFYjZgaJpZM4KaZ2W .

-- Dr. Jeremy Dean Director of Education

seanh commented 7 years ago

My vote in both cases would be "all." I think Dan said it better above but basically the time buckets are just a way to help know at a glance what docs were most recently annotated. It's really all about the docs themselves which are objects that exist across time.

@jeremydean In that case, is this a search page for searching for annotations? Or is it a page for searching for documents that were annotated (and then being able to see all the annotations of each document)?

If it only shows the documents that matched your search, but it shows all the annotations of those documents (whether the annotation itself matched the search or not), then it sounds to me more like a document search feature than an annotation search feature. You type in a search query, and it shows you the documents that matched your search.

seanh commented 7 years ago

@dwhly So you're saying that we should, potentially, remove the timeframe buckets and just show the search results as a list of documents? With the date of the most recent annotation somewhere near the top of each document. And when there is only one document on the page, then just auto-expand it and show the annotations on page load.

I like the idea of simplifying the display rather than making it more complicated by potentially adding more / cleverer time buckets (we could also go down that path but there are hidden intricacies)

dwhly commented 7 years ago

@seanh writes:

Should the annotations count, list of annotators, and list of tags in the top-right of document bucket be all the annotations, annotators and tags of that document?

If there is a search query active which is limiting the annotations shown in a document view to the query terms, the tags and users surrounding that in document view should only relate to the query terms. This is actually the behavior of the outer tags module for instance. Narrow the user profile view by clicking a tag and you only get the tags shown which remain at the intersection of those terms.

To me, that goes doubly so inside the document view.

jeremydean commented 7 years ago

hmmm, i'm not certain it's an either or situation. what are the stakes of your question and how does it effect the work to be done.

we know that an activity page of just annotations is not as helpful as it could be. that's what we have now and why we're doing something else. i have always said that annotations should essentially be collapsed by document so that what i see is a list of documents with annotations expandable.

IMO once i'm in the world of a particular doc, the chronological interest changes: where i might have at first been interested in a recently annotated doc, there's no reason for me to assume that all annotations on that doc are from that same moment in time. hence we have time stamps to further indicate when exactly the doc was annotated. the major takeaway IMO is that i've gone to a doc that i know has been recently annotated. ᐧ

On Tue, Dec 6, 2016 at 10:21 AM, Sean Hammond notifications@github.com wrote:

My vote in both cases would be "all." I think Dan said it better above but basically the time buckets are just a way to help know at a glance what docs were most recently annotated. It's really all about the docs themselves which are objects that exist across time.

@jeremydean https://github.com/jeremydean In that case, is this a search page for searching for annotations? Or is it a page for searching for documents that were annotated (and then being able to see all the annotations of each document)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hypothesis/product-backlog/issues/20#issuecomment-265194880, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHFzmGPRCu3cPiSreDYo-K4k0WP6eDOks5rFYuOgaJpZM4KaZ2W .

-- Dr. Jeremy Dean Director of Education

dwhly commented 7 years ago

we should, potentially, remove the timeframe buckets and just show the search results as a list of documents?

Potentially, yes. I'd love to hear others' perspectives-- but it's obviously tricky to get the bucketing to be actually helpful, and to sensibly fashion "plain language" labels for them. It would be easier, and perhaps better (?) to just print the actual time stamp on each document. A quick scan of the documents would then tell you with much more fidelity than the buckets exactly when they were made. We have the room to do so-- on mobile we'll have to look at the layout to find the right place.

With the date of the most recent annotation somewhere near the top of each document.

I was thinking-- for collapsed documents-- on the same line, after the title, before the annotation count, right justified. Use the same intelligent abbreviation that we do on annotation cards (which still needs to be implemented on the cards in activity pages).

And when there is only one document on the page, then just auto-expand it and show the annotations on page load.

Yep.

seanh commented 7 years ago

@jeremydean I'm just trying to figure out whether we want to:

Find all the annotations that matched the search and then group those annotations by document. Under each document show only the annotations (and tags, annotators, etc) of that document and that matched the search. Or
Find all the documents that matched the search, and all the annotations of all those documents, and show all of them.

The current implementation is closer to 1, except that it doesn't even show all of the matching annotations of a document under that document, because of the timeframe bucketing they may be divided between multiple appearances of the same document in the search results.

I think 2 might be much more work to get to, from what we have now.

jeremydean commented 7 years ago

to me the time bucketing corrupts the logic of 1. it falsely cuts out the fact that other users have annotated a doc that might come up in search.

for example, in just searching for a recent activity, i expand annotations on a particular doc. the right sidebar indicates 1 annotator and 18 annotations.

in fact that doc has multiple annotators and many more annotations.

dwhly commented 7 years ago

I think we're on the same page. A proposal:

1- Kill the buckets 2- Print time stamps on each document 3- For single document results, expand to see all annotations 4- The users and tags shown around documents should be for the results set, not for all annotations on the document outside of the current query terms.

To expand on 4:

Limiting tags and users to the current results set specifically answers the question "who are the users that annotated this document for the query specified, and which tags have they used".

If we don't limit users/tags in this way, you can't easily know the answer to this question, particularly when the list of annotations matching the terms is reasonably long. By contrast, by removing query terms, you can easily know the larger question.

seanh commented 7 years ago

3- For single document results, expand to see all annotations

What should be the cut-off point after which we stop auto-expanding the document bucket(s) on page load? Two or more documents = show them collapsed initially?

So we would not show a search results page like this:

screenshot from 2016-12-06 16-59-44

But we would allow one that looks like this:

screenshot from 2016-12-06 17-03-02

Also even if there are many documents on the page, I'm not sure it ever makes any sense to collapse a document that only contains 1 annotation since you're hardly saving any vertical space by doing so. Again, seems like requiring pointless clicks from the user. In the most extreme case of a page showing many documents each with one annotation, the user has to do one click to reveal each one annotation they want to see (up to a potential maximum of 200 clicks per page of search results). That is extreme, but the case of a page showing many documents most of which have only a few annotations under them is probably common.

It might be better to always show the first X annotations of each document and then, if there are more than X annotations of that document, have the rest collapsed under a "Show Y more annotations" (of this document) button.

seanh commented 7 years ago

1- Kill the buckets

And I assume that you don't want the same document to appear on multiple pages of the search results, either? That is, given a search query with 10 pages of results, I wouldn't want to see document Foo on page 1 only to page through and find document Foo again on page 5 (with older annotations of the document)

seanh commented 7 years ago

To expand on 4:

Limiting tags and users to the current results set is that it specifically answers the question "who are the users that annotated this document for the query specified, and which tags have they used".

If we don't limit users/tags in this way, you can't easily know the answer to this question, particularly when the list of annotations matching the terms is reasonably long. By contrast, by removing query terms, you an easily know the larger question.

Actually, I'm not sure that you can know the larger question ("Who are all the users who annotated this document? And what are all the tags that they used?"). How do you do a search for all annotations of a given document? When a document appears on a user's page, it's showing only that user's annotations of that document. On a group's page, only the group's. On the /search page it would be showing all annotations, but you can only conveniently find the globally most recently annotated documents that way. I think the only other way to answer that question currently is to manually enter a url:... query for the document's URL? This could be fixed by, when showing only the matching things under a document as Dan wants, adding some sort of "See all annotations of this document" link that links to the url:... query for the document (a.k.a the document's page). AFAIK we don't currently have any convenient links to that.

dwhly commented 7 years ago

What should be the cut-off point after which we stop auto-expanding the document bucket(s) on page load? Two or more documents = show them collapsed initially?

To keep it simple for now, I'd argue yes. We can tweak later.

Also even if there are many documents on the page, I'm not sure it ever makes any sense to collapse a document that only contains 1 annotation since you're hardly saving any vertical space by doing so.

I think that there is a strong benefit for the collapsed document view in keeping the elements consistent so that the eye can scan them easily. So, don't selectively expand singles.

In the most extreme case of a page showing many documents each with one annotation, the user has to do one click to reveal each one annotation they want to see (up to a potential maximum of 200 clicks per page of search results).

Remember we have a handy "expand all" button coming in the interface post-launch. :)

It might be better to always show the first X annotations of each document and then, if there are more than X annotations of that document, have the rest collapsed under a "Show Y more annotations" (of this document) button.

The benefit of the document view is density. "Let me first know the documents that have been annotated." I think by selectively expanding singles or the first n annotations for each, we lose that advantage-- as well as the advantage of the ease of scanning something that has a consistent look.

dwhly commented 7 years ago

This could be fixed by, when showing only the matching things under a document as Dan wants, adding some sort of "See all annotations of this document" link that links to the url:... query for the document (a.k.a the document's page). AFAIK we don't currently have any convenient links to that.

+1

seanh commented 7 years ago

1- Kill the buckets And I assume that you don't want the same document to appear on multiple pages of the search results, either? That is, given a search query with 10 pages of results, I wouldn't want to see document Foo on page 1 only to page through and find document Foo again on page 5 (with older annotations of the document)

The reason I asked this is that, assuming the answer is no we don't want to see the same document appear twice on different pages of a paginated search result list, then it raises the question of how many annotations to show under a single document when it's expanded?

A single document in the list could have any number of annotations that match the search. 3000, say. We can't show 3000 annotations when you expand the document.

But I think we could just show the first N matching annotations of that document and then a "See all matching annotations of this document" link that links to a page with the same search query but with a url:... facet for the document added in. That would be a (paginated if necessary) list of all annotations of that one document that match the search. We would need to do this, I think.

(Potentially confusion between two different links "See all annotations of this document" and "See all matching annotations of this document")

dwhly commented 7 years ago

The reason I asked this is that, assuming the answer is no we don't want to see the same document appear twice on different pages of a paginated search result list ...

If we no longer bucket, but only list each document once, in reverse time order of the last annotation made against it then I assume by definition it would only show once, on one results page, and not on subsequent pages.

... then it raises the question of how many annotations to show under a single document when it's expanded? A single document in the list could have any number of annotations that match the search. 3000, say. We can't show 3000 annotations when you expand the document.

Yep, and this is a problem we have now. Not a new one. (Like if a document has 3000 annotations, all made in the last 24 hours). So, it's something we need to address, but is not necessarily a blocker on the questions we're dealing with above.

But I think we could just show the first N matching annotations of that document and then a "See all matching annotations of this document" link that links to a page with the same search query but with a url:... facet for the document added in. That would be a (paginated if necessary) list of all annotations of that one document that match the search. We would need to do this, I think.

Stated another way, we could in the future, choose to paginate expanded annotations (regardless of whether it's for a single url: query. )

seanh commented 7 years ago

If we no longer bucket, but only list each document once, in reverse time order of the last annotation made against it then I assume by definition it would only show once, on one results page, and not on subsequent pages.

Unfortunately no, that's now how it works. How it works currently, when you search for "foobar":

We get from Elasticsearch a list of all the annotations (not a list of documents) matching "foobar", in chronological order, not bucketed by timeframe or document.
We paginate that list: the most recent 200 annotations go into page 1, the next 200 go into page 2, etc.
For the first page (or whatever page you're on) we bucket the annotations within that page into timeframes.
For each timeframe bucket within the page, we bucket the annotations within that timeframe into documents.

So the annotations are first ordered chronologically, then bucketed into pages, then each page is bucketed into timeframes, then each timeframe is bucketed into documents.

It is easy to just remove step 3, but then you will still have the same document appearing on different pages. Reversing the order of steps 2 and 4, so that the same document never appears on multiple pages, I think is likely to be much more difficult.

seanh commented 7 years ago

... then it raises the question of how many annotations to show under a single document when it's expanded? A single document in the list could have any number of annotations that match the search. 3000, say. We can't show 3000 annotations when you expand the document.

Yep, and this is a problem we have now. Not a new one. (Like if a document has 3000 annotations, all made in the last 24 hours). So, it's something we need to address, but is not necessarily a blocker on the questions we're dealing with above.

No, we actually don't have this problem now, because we allow the same document to appear on multiple pages of the same search results. So if a document has 3000 annotations, all made in the last 24 hours, then on the first page you will get a document bucket with 200 annotations in it, on the second page you will get a second document bucket for the same document with the next 200 annotations in it, and so on, for 15 pages. That's how it works currently.

seanh commented 7 years ago

But I think we could just show the first N matching annotations of that document and then a "See all matching annotations of this document" link that links to a page with the same search query but with a url:... facet for the document added in. That would be a (paginated if necessary) list of all annotations of that one document that match the search. We would need to do this, I think.

Stated another way, we could in the future, choose to paginate expanded annotations (regardless of whether it's for a single url: query.

Do we want to paginate them in-place? Or do we want to link to a URI search page which is a paginated list of just the annotations of that document? (Which is what our current uri:... search pages are, although they look silly because they have the timeframe and document bucketing.)

If we make use of the separate document search pages, as we do for users and groups, we could potentially add a document sidebar like the user and group sidebars with some useful metadata in it as well... Though I'm not sure exactly what. It feels right for me for each document to have its own page and URL, though.

seanh commented 7 years ago

My thoughts on how to break this down, not necessarily in priority order:

Don't bucket by timeframe on URI search pages. Should be easy to do
Auto-expand the document bucket on URI search pages (once we're no longer bucketing by timeframe on URI pages, there can only ever be one document bucket per page for URI searches). Should be easy to do.

1 and 2 together should suffice to make the presentation of URI search results much better, which I think was what @judell was originally asking for by opening this issue.

Although I'm not sure that @judell actually wanted to remove the timeframes from URI search pages, he just wanted a list of all annotators of the URI in one place, that could potentially be provided by a document sidebar that appears on document search pages, like the user and group sidebars we have, and (if we think it's useful) we could leave the timeframe bucketing in place.
Auto-expand the document bucket on all search pages, whenever there is only one document on the page. Should be easy to do.

The reason I've listed this separately from (2) is that, even though I think there are benefits to doing this, I also think it might be confusing to the user if search results are sometimes collapsed and sometimes not. Might not be obvious why. If they always expanded on URI searches but never on any other searches that should be fairly obvious so I think (2) is safe, but I think (3) would be worth implementing separately and with a separate feature flag so that it can be tested and if necessary backed out separately.
Add some sort of "Show all annotations of this document" link to document buckets. Should be easy to do. Design needed.

@jeremydean wanted to see all users and tags of a document, not just the matching ones. We're not going to do that, but this at least gives him a link to the page where he can see that. Also we have this URI search capability but currently no easy way to access it, this gives one way.

Probably worth doing 1 and 2 before 4, so that the URI search pages we'll be linking to are better.
Don't bucket by timeframe on any search pages. I think this might be hard to do.

The reason I've listed this separately from (1) is that I think (1) should be quick and easy to do but (5) could be hard:

a. The same document can also appear on different pages of a single paginated search query, I don't think we want that, but I think it will be harder to fix. On URI pages this isn't a problem.

b. We can't show all matching annotations of a document under the document, because that could be a lot of annotations, so we would need to truncate that list and add a "Show all matching annotations of this document" link. Potential confusion with the "Show all annotations of this document" link in (4). Design needed.

Note there is an easier version of (5) which is to simply remove the timeframe buckets from within search pages, but still allow the same document to appear on multiple different pages of the same search results. But at least the same document won't appear twice within one page. We could implement this without needing to do a or b above. This would be easy to do. But is it worthwhile without a and b?
Add timestamp of most recent annotation to each document bucket. Easy to do. Doesn't seem to make sense without (5) first.

Did I miss anything? Any changes?

judell commented 7 years ago

Thanks @seanh. I agree with your analysis and proposed sequence: do 1, 2 and 3 sooner, let user feedback guide the rest later.

dwhly commented 7 years ago

@seanh: As we discussed, I'm strongly in favor of taking the simple approach with 5, and allowing a document to show in results in subsequent pages.

So, my modified list is:

Don't bucket by timeframe at all.
Instead, add timestamp of most recent annotation to each document bucket.
When there is only one document in a result-- auto expand it.
Consider a better way to show "All annotations, and all annotators" on a given document.

klemay commented 3 years ago

We are working through many of these questions and problems as we build a capability for LMS users to see and filter annotations made in a course (See: https://github.com/hypothesis/product-backlog/issues/1161).

The plan is to replace our activity pages with the work that comes out of the LMS project. I'm going to close this issue and propose that further discussion on how we display annotations happens as part of the iteration process for the new project.

hypothesis / product-backlog

On a URL query, time-bucketing groups the list of annotators unhelpfully #20