invoke-ai / InvokeAI

Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.
https://invoke-ai.github.io/InvokeAI/
Apache License 2.0
23.65k stars 2.43k forks source link

[enhancement]: reintroduce infinite scroll of images #6632

Open nutspiano opened 3 months ago

nutspiano commented 3 months ago

Is there an existing issue for this?

Contact Details

No response

What should this feature add?

The pagination of the image library is a terrible idea. I want to be able to quickly find images in a long list of images. Scroll wheels were invented for this, let's not disable them.

Also, nobody asked for infinite scroll to be removed, including the person that asked for pagination in #5710.

Reintroduce infinite scroll as the default and make pagination an option.

Alternatives

No response

Additional Content

No response

psychedelicious commented 3 months ago

Infinite scroll was not removed in response to any user request. It was a memory leak and one of the most technically complex parts of the client making it full of edge cases and bugs (a number of which we never fully figured out). We aren't going to restore it.

We can explore ways to improve the user experience with pagination. Do you have any ideas for that?

ufuksarp commented 3 months ago

Wouldn’t the ability to change the page with scroll wheel solve most of the problem here?

nutspiano commented 3 months ago

Making the scroll wheel change the page is a start, but still makes things paginated, which in itself is bad UI. Being able to put the images you want to review all in the "viewport" that is the limited space of the images part of the window is what I think is necessary, especially for visual work like reviewing images. For this, one needs to be able to scroll to an exact image in the top row, not be at the mercy of some arbitrarily generated "page".

Without having looked at how the pagination was implemented, my suggestion is this: on a page with 5 rows of 3 images, image 1-15 is shown. When using the scroll wheel, pop the top 3 images off the page and make a new "virtual page" consisting of images 4-18, shifting things one row up. This should be indistinguishable from infinite scroll.

This messes slightly with the concept of "pages" as it presents currently, so until one clicks on an actual page box below, they could be greyed out. But they can still be there if one wants to use them as shortcuts. Once one has scrolled far enough for the images displayed to be image 16-30, one is effectively on page 2, and the boxes below can change to reflect this as if page 2/next had been clicked from page 1.

Gabrielmtn commented 3 months ago

I don't know much about the inner workings of this project, but I did see a comment on this on Reddit, and have spent some time with problems like this as both a frontend and UX designer so figured I could add perspective.

Scrolling has far lower user interaction lift, and thus UX cost than clicking/tapping via pagination. This is making UX worse to accommodate a challenging technical problem, which of course has to happen sometimes, but I don't think this is the way to fix it.

If you don't have anyone within your core group of contributors advocating for balancing UX patterns with memory management, and instead defer to what's technically more accessible or knocks out many bugs at once, you'll end up with many tabs, and many other patterns (similar to pagination) which will just be trading technical debt and bugs for design debt and poorer UX.

Perhaps it would be acceptable to trade loading time, for the scroll mechanic. That is to say, remove out of frame DOM elements as they're scrolled past, so the DOM doesn't become clogged with hundreds of elements. Of course scrolling back up would have a loading time associated with it, but you'd retain the ability to scan through many items without requiring many clicks from the user, just waiting.

Just sharing a complete outsider's opinion. I understand sometimes past architectural decisions limit possible paths forward, but perhaps there's a balanced way to accommodate the memory issue.

Here's a related article I found while looking into people who have written about this topic, possibly helpful: https://dannysu.com/2012/07/07/infinite-scroll-memory-optimization/

ufuksarp commented 3 months ago

Scrolling has far lower user interaction lift

For small libraries that might be true.

Gabrielmtn commented 3 months ago

@ufuksarp Wouldn't the interaction lift for large libraries increase linearly as the library size grew? Every page would require another click

ufuksarp commented 3 months ago

@ufuksarp Wouldn't the interaction lift for large libraries increase linearly as the library size grew? Every page would require another click

This is not an argument against pagination. It’s an argument against large libraries for which infinite scrolling has no solve.

Without scroll wheel implementation, having some non-linearly scaling page buttons and being able to jump to a specific page, pagination system is only slightly better. But I think they are testing the waters for now before implementing new things on it.

Gabrielmtn commented 3 months ago

To me, it's a clear argument against pagination, because large libraries are to be expected and designs should accommodate for them.

Perhaps I'm not following, maybe you could explain it a different way? My best attempt to interpret this is that users would be 'wrong' for having large libraries as it would cause a lot of clicking within a pagination pattern?

Neither scrolling nor pagination will change the existence of large libraries, but only one of the patterns will cause user interactions to grow linearly with the size of the library, that being pagination requiring more and more clicks to traverse the many pages.

I'm certainly open to other ideas, but I'm not following the nuance you're trying to add.

ufuksarp commented 3 months ago

Refer to the second paragraph of my latest reply please. Gotta leave for work!

Gabrielmtn commented 3 months ago

When using the scroll wheel, pop the top 3 images off the page and make a new "virtual page" consisting of images 4-18, shifting things one row up. This should be indistinguishable from infinite scroll.

This seems like a happy medium, a very solid piece of interaction design that will appeal to every perspective. Allowing scrolling to the bottom of a page to load and append the next page, while removing the previous to manage memory effectively. This way the interaction won't require users to move their mouse and click to load every next page, while retaining the control over where they are in the list, as well as the ability to jump to the end.

This seems to be a related pattern that could be looked to in this context: https://www.nngroup.com/articles/skeleton-screens/

ufuksarp commented 3 months ago

Okay, I have some time.

To me, it's a clear argument against pagination, because large libraries are to be expected and designs should accommodate for them.

And that is pagination. How does infinite scroll help with navigating large libraries?

Perhaps I'm not following, maybe you could explain it a different way? My best attempt to interpret this is that users would be 'wrong' for having large libraries as it would cause a lot of clicking within a pagination pattern?

You click to the page you want to go to, and if that's not it you click 1-2 times more. How does infinite scrolling help you go 60% of the way? For example 1500th image out of 2000?

Neither scrolling nor pagination will change the existence of large libraries, but only one of the patterns will cause user interactions to grow linearly with the size of the library, that being pagination requiring more and more clicks to traverse the many pages.

User interactions grow more than linearly with infinite scrolling with the addition of laggy UI. Pagination helps you get there faster.

I'm certainly open to other ideas, but I'm not following the nuance you're trying to add.

You ignore my points. Try not to.

Gabrielmtn commented 3 months ago

You ignore my points. Try not to.

Why would I ask clarity if I was ignoring your points. Perhaps you could make them more clearly, and spare me your twatty snark?

ufuksarp commented 3 months ago

You ignore my points. Try not to.

Why would I ask clarity if I was ignoring your points. Perhaps you could make them more clearly, and spare me your twatty snark?

Very cool.

Gabrielmtn commented 3 months ago

What’s ironic is that you made good points about a very large scroll area that I hadn’t originally considered, or understood that you were making, but in a snide way that ensures I have zero interest in further contributing to this conversation, or project as a whole.

someaccount1234 commented 3 months ago

this is because it was asked if any ideas about the thumbs....

additional dedicated pagination tab with group creation? (to see 10x more), mem info and garbage collect button wouldn't be bad. cull there and carry groups over to the group and pagination dock

smaller thumbnails? (real size images may be loading as thumbs). I thought what's what it was using +20 more button anyway

better drag drop support for external image managers folders multi file and maybe .zip projects. I could completely use external manager off to the side, something MADE for viewing and sorting images. (xnview or ImageSorterV4 if it had dragdrop)

I like pagination and its fine if massive libraries aren't built without organization. so groups of groups and more group options can help with that. and draggin and droppin externally.

psychedelicious commented 3 months ago

Y'all - chill. Feedback is valuable but let's please be respectful.

Since there seems to be interest, I'll explain in detail the reasoning for the decision to remove infinite scroll. I hope it is clear that this is a passion project and we are dedicated to making a beautifully usable tool. A great gallery UX is obviously very important.

Why infinite scroll in the first place

Way back in ye olden days, the app was much simpler and the gallery was a small panel on the right side with a continually extended list of images. As the app has evolved, and the feature set evolved, this pattern was never re-evaluated. _Scope creep!_ Pagination is an attempt to resolve technical issues and find a more functional UX.

Technical challenges with infinite scroll

The memory issue is not DOM-related. We used a virtualized/windowed list, as @Gabrielmtn described, which renders only the visible elements plus some buffer. That part was fine. We also used skeleton loaders and tiny WEBP thumbnails. These are standard practices.

The issue is that we don't have a fully normalized query cache of images records in the browser and our image records are frequently mutated. Ensuring we keep the infinite scroll's internal data updated proved to be rather challenging. We needed to manually implement a lot of caching.

For better or worse (well, definitely for worse), we ended up with normalized and unbounded cache. Every image loaded in a particular browsing session was in the cache. That means as you scroll down, we load hundreds and thousands of images... While we only ever render however many are visible, there's a steadily growing list in memory.

Well duh just do ref counting

Ok, there is a clear solution - do some reference counting and only keep relevant images cached. While obvious, this gets really complicated with the normalized cache, lists of tens of thousands of images, boards, different image categories, and a gallery that is constantly having new images pushed into it. Like multiple images every second. We also let you mutate multiple images at once (e.g. drag an arbitrary number of images into a board).

In short, we have complicated image functionality across huge numbers of images that are constantly being mutated and created and the implementation was not maintainable. Of course, some of these issues are knock-on effects from bad technical decisions made earlier.

That's the brief technical spiel. There are other technical issues I'm sparing from this discussion.

User-facing issues

Our infinite scroll implementation was not "correct" or "good". If you think it was, you just haven't used the app enough or used all its features.

The tangled mess of manual caching and cache invalidation was excessively complicated and we never successfully found all the edge cases. There were many ways gallery data ended up stale in one way or another that we weren't able to address. You can imagine attempting to troubleshoot a bug that only pops up after you've scrolled X images in, and have used Y and Z feature in a particular order, and so on.

Also as @ufuksarp mentioned, traditional infinite scroll isn't particularly useful when you have a ton of images. If your image is 3000 down in the list, you have to scroll a looooong way to get to it.

Why we moved to pagination now

These issues were mounting and we needed to fix them. Problems moving images between boards were particularly impactful. Some features (like metadata search) were difficult to implement on top of infinite scroll.

We decided to eliminate a few thousand lines of suffering and instead use straightforward pagination rather than waste time attempting to fix an inherently flawed implementation. Pagination took about 1 hour to implement, compared to at least a week for infinite scroll (and it was still buggy). Yes, I am a bad programmer.

Pagination Pain Points

  1. It's harder to find a specific image by changing pages, than it is to scroll. We've attempting to improve this by adding metadata search.
  2. Similar to 1, it's hard to just browse through your images aimlessly.
  3. The page size is small and dynamically calculated based on available space. I don't like this, but we needed to start somewhere. Maybe it's better if each page has a set size, but there is still a scroll area within each page? This looks very awkward though.
  4. Because the page size is dynamically calculated and because new images are constantly being added, there's no "page stability" for images, unless you are in a board that images aren't being added to.
  5. The pagination UI is limited to first, prev, current page plus a few siblings, next, and last. There's no way to go to a specific page or jump forward by larger numbers of pages.

Any others?

Desired functionality for gallery

Any others?

Alternative infinite scroll-ish pattern

There's another approach to infinite scroll - instead of extending the list when you get to the bottom, you start out with a scroll area big enough to hold all images, and load pages of images depending on the scroll position. I think this would allow us address the complicated caching issues, but it is a bit more involved than pagination.

hipsterusername commented 3 months ago

Y'all - chill.

I will reiterate this a bit more forcefully.

The world is increasingly a volatile, divisive place - But it won't be tolerated here. Be nice, and follow the rules for critical debate.


Invoke is a tool to primarily solve problems for enthusiasts and professional creatives -- Namely, people who creating a lot over the course of working in the tool.

The trade-off for seamless narrow discovery (scrolling through a document) is broad discovery (using bookmarks/an outline) -- Broad discovery is a need more often as gallery sizes increases. We are prioritizing that need, recognizing that there may be some minor friction in narrow discovery in the near term, which can be resolved with further iteration.

Infinite scroll is a single solution. We won't be returning to it.

If you present the problems you're experiencing - "I want an easier way to scan the entire contents of a board", "I want to listlessly skim through my old generations", "I'd like more consolidation of images in pages, so that I can better navigate sets of generations easier", etc. -- We can solve those other ways.

Stating that ideas are terrible (and implying they haven't been thought through by a team that spends nearly every waking hour thinking about how to make the product meet the needs of its many users) offers little context into what you're actually trying to do, and casts the entire suggestion into a poor light.

Please help us help you by offering better feedback.

Thanks.

(Gold stars for apologizing if you could have treated someone else nicer in this thread -- ⭐)

Gabrielmtn commented 3 months ago

Be nice, and follow the rules for critical debate.

Being nice and creating an environment of mutual respect are important values, and challenging to stay on top of as well, I imagine. With the limited time in the day I don't suppose it's something you'd ideally spend time focusing on.

I do apologize to any of the Invoke team for my reactive comment earlier towards ufuksarp, I could've been nicer about it. I'll be more respectful of your space in the future.

clsn commented 3 months ago

I understand [there are] reasons against infinite scroll. But if we could just have the mousewheel events scroll by windows, I think that would be something that would help a lot. Should that be a separate issue, or is it sufficiently covered/brought up here?

psychedelicious commented 3 months ago

While I'm open to exploring wheel events, they feel like a can of worms.

One unit of "scroll" equates to one page increment or decrement. How do you threshold it to determine what a "unit of scroll" is? Different mice have different scroll magnitudes per "click" (event) of the scrollwheel.

Ok, one "click" (event) per page. But that doesn't work for trackpads, which continually emit tiny magnitude clicks.

Ok, well different handling for trackpads then. Unfortunately, there is no straightforward, reliable method to differentiate between mouse and trackpad.

I mean, it could just be fairly dumb and do a single page for any scroll while the gallery is focused, and stop listening until the next page is loaded, but that still has issues:

There are good reasons why the scroll wheel isn't often used for discrete things like changing the page. It's meant for continuous things.

How about hotkeys for +1/+5/-1/-5/first/last page?

nutspiano commented 3 months ago

Thank you @psychedelicious for taking the time to write such a detailed answer. No reason for any coding ability self deprecation though, it sounds like a great idea to start fresh as was done.

It seems to me like with what you built for the pagination, it is already flexible enough to navigate the gallery in other ways. You have offsetChanged in galleryOffsetChanged.ts that listens for arrow keys. If the selection hits the edges and changes the page, what is displayed in the gallery is gotten with a call to /api/v1/images/?board_id=X&limit=X&offset=X, which merrily hands you the necessary json for any number of images from any offset.

My suggestion: listen for scroll wheel events, increment/decrement the offset by the width of the gallery, while retaining the limit, refresh. In effect, this would make the top/bottom line of images "scroll off", everything shift up/down, and a new line of images appear on the opposite side of the gallery.

As for your last comment with troubles with determining the size of a scroll "click"/trackpads, it sounds like a can of worms indeed. I suggest just making one click equal changing the offset one width of the gallery, scrolling one line off, with a delay (I see 40/50 ms came up on SO), and see how that feels.

There was some discussion above about the pains of scrolling in a gallery with thousands of images. For this I suggest the humble scroll bar, which can similarly set a new offset of (window.scrollY / window.innerHeight) * total_gallery_length or similar.

ufuksarp commented 3 months ago

Ok, one "click" (event) per page. But that doesn't work for trackpads, which continually emit tiny magnitude clicks.

Can't a number of these tiny clicks equate to a mouse's?

Ok, well different handling for trackpads then. Unfortunately, there is no straightforward, reliable method to differentiate between mouse and trackpad.

Switch in settings for trackpad handling?

Or a switch saying "Change pages with scroll wheel"? If a trackpad user complains about the feature, they should read what it says or should get better at it.

clsn commented 3 months ago

While I'm open to exploring wheel events, they feel like a can of worms.

Ah, okay. I see what you mean. That kind of sucks. Yes, there might be ways out of it. Maybe some configurable thresholds as to how many events map to how much scrolling, and there are potentially good ideas here, and I hope something can be worked out. But your post informed me that this isn't so simple a thing and I shouldn't get my hopes up or expect it as "oh, sure. here's a quick fix." So I can now deal with this becoming a WONTFIX, and I'll work on retraining my scroll-reflexes to hit the PgUp/PgDn keys or whatever. Thanks!

Switch in settings for trackpad handling?

Yeah, at least some of these "how can we tell/how should we handle different devices" can be worked around in settings, if need be.

psychedelicious commented 3 months ago

The wheel events have some data attached like a "distance" and yes, you can add the trackpad clicks up over time until they cross a defined threshold.

Yeah @ufuksarp @clsn good reminders about just making "scroll changes page" a setting - we'd do that if we added some kind of scroll-dependent pagination.

But the thing is, I don't think anybody actually wants the scroll wheel to change page numbers anyways. Have you ever used an application where scrolling changes pages (when presented via pagination)? I never have and I think it's for a good reason. What we all want is to just be able to scroll through the images.


Recently I tested out another approach that I had mentioned earlier in this thread, which you @nutspiano also described. Make the gallery container large enough to fit all images and map scroll position to a range of images to load. We need to a list of all image names up-front, then grab the full image data for the visible images. It's super simple.

https://github.com/user-attachments/assets/c9a672d6-efc0-4b60-b592-7645f6220d89

Ignore the page numbers in this video, they are nonfunctional (it's loading images in a different way). I'm scrolling through a couple thousand images. I made it display list for ease of making the rough draft, but in the actual implementation we'd have the grid.

It only loads images that are currently visible and cleans up after itself by letting unused data be garbage-collected.

There are some technical issues and I'm not sure how to fix them:

I brainstormed a few ways to resolve these issues but none stood out as the clear answer, maybe I'll format those notes into something more readable if somebody wants to think about this problem.

nutspiano commented 3 months ago

That there seems very close! A promising prototype.

I just did a request for the full list of images from /api/v1/images/?board_id=X&limit=3000&offset=0 on a ~1k images board, the response was ~2 MiB + 185 ms round trip (ran locally). Is it this size/cpu load you are concerned about? Not saying I have maxed it out with this 1k board, but the scaling doesn't seem too bad.

Insertions and the like can get annoyng quick. While insertions would be the best solution, I suggest just making full list refreshes less taxing. Maybe a new endpoint that just gives what is needed to populate the full list, then use the /api/v1/images endpoint to flesh out what is currently shown and get thumbnail paths etc?

And thinking about it a bit more, do you even need the filenames on the first pass? Isn't what you really need just the number of images on the board, then you can generate a skeleton list client side and populate it gradually through /api/v1/images?

psychedelicious commented 3 months ago

The way I implemented it in that example is similar:

The end result is as displayed. The virtualization and caching is largely handled by external libraries - our code is very simple.

We need the virtualized list to have image names or full DTOs - as opposed to just an total count and index - to support selection of multiple images. The usual click-scroll-shift+click interaction selects all images between the clicks. This creates a draggable payload, wihch can be dropped on e.g. a board. That payload must include at least the selected image names, but preferably the DTOs.


One challenge not yet mentioned is network latency. Most users have their server running on their LAN, but many do not. We also have a cloud offering that uses the same frontend logic where possible, with a more complicated backend.

So while this feels like a lot of work compared to just caveman-ing it and send over the full list of all image DTOs, we need to consider performance - both client and server. Hence the focus on optimistic updates by inserting images without re-fetching the whole thing.

(The paginated gallery doesn't bother with optimistic updates, since fetching a single page is pretty lightweight)

That said, the caveman route does make insertion easier, because with the full image DTO object, we get access to attributes like created_at and starred - the attributes with which the backend sorts the images. We can recreate that sorting logic in the client and figure out where to insert the images. Clientside performance of inserting into a large array remains to be seen, and measurements would need to be taken in the context of app, with all its overhead.

nutspiano commented 3 months ago

The cache keyed by image name sounds great. Premature/early optimization aside, I want to make sure I have gotten my point across about delaying sending image names (large for their purpose, high entropy/low compressibility) across the network before it is absolutely needed.

/api/v1/images is essentially a "get DTO by index" endpoint already. When switching to a board, as I understand the only possible UI response is to show the first images/first page of the board, so limit=[visible images in grid]&offset=0 is retrieved from /images. And the total size of the board is used to make the pagination, making pages with images that do not have their DTO loaded yet.

My point is that up until the full DTO is received, the names are unknown. And for the rest of the pages, they still are and that's fine. If you go to page 5, the DTOs for those images are loaded through /images, but DTOs for page 2, 3, 4 are still unknown. As far as I know there is no way to (with pagination) make selection payloads across pages? So with pagination, everything you are able to select must be displayed first, and thus the DTOs are retrieved and the names are known.

Onto scrolling. As I understand the problem arrives when there is a selection done across undisplayed images with unknown names, if one for example selects the first image on the board, and then clicks the scroll bar 50% down the page and shift-clicks another image to select half the board, expecting every image inbetween to get selected. Up until that selection is made, you still did not need the names (or any info) for images inbetween, since you are getting DTOs through /images based only on index. And what you have is full DTOs for the first page, and the images you are now seeing halfway down the board.

When the selection is made, one does indeed need metadata to make the payload. Then, one can either ask /images for an index based full DTO (probably bad/wasteful), or your endpoint get_image_names can be expanded to support index based retrieval through limit and offset if only the name is needed for the payload (probably better), or some new endpoint with a detail level inbetween the two satisfying the payload requirements. But even with this, can't any names (or other DTO data) below this 50% point still remain unknown without issue?

To me, it seems like the index is a great low-bandwidth way to point at the images you want, and then populate as late as possible and as little as possible through /images, get_image_names with indexing, and similar endpoints, for different use cases (displaying, selecting, future).

rollingcookies commented 2 months ago

I'll just say that the pagination is terrible and inconvenient. Immediately wanted to roll back to the previous version.

Trismegiste commented 2 months ago

Just a quick reminder : Infinite scroll is a Dark Pattern that should be banned. Aza Raskin, the inventor of infinite scroll regret his invention. https://www.youtube.com/shorts/0UgJpdGmLlQ

I'm very pleased it is removed (though it was for technical reason) It's time to discover the "board" feature built in InvokeAI

nutspiano commented 1 month ago

Mr. Raskin is quite delusional indeed. Regardless, this issue seems to have stranded. We will see if the other Invoke features will convince people the UI is worth it.

psychedelicious commented 1 month ago

I think it makes sense for the scrollbar to represent the full list of images, as I had described earlier. Not exactly infinite scroll but similar. I'm not sure when we will be able to dedicate resources to implement this though.

rollingcookies commented 1 month ago

From the point of view of social networks, the infinite scrolling pattern really works against the user - it too well and conveniently allows the user to be overwhelmed by the flow of information from different sources. Actually, this is why Mr. Raskin condemned this pattern in the context of social media. Because the information is delivered to the user in a rapid flow, and he may not be ready for it, so it is better in such a scenario to dose the content by separating it with the "Load more" button.

But the infinite scroll pattern is good in that it allows you to quickly get a large amount of content and in the task of constantly selecting options in a huge pile of similar images - infinite scroll - the ideal pattern for the user. Then the work will be maximally focused and there is no need to be distracted by constant clicks to move through the pages.

The horror of page pagination in the current Invoke implementation is amplified by the fact that images are added from the top of the list, not from the bottom - this additionally introduces confusion and confuses the user, because what was on page 5 will end up on page 7 after a few generations. And whatever it is not annoying, it is necessary to constantly mark something with an star - it worsens the UX of the gallery and makes the user make even more useless clicks.

A great example of implementing an infinite scroll gallery that also has boards is the Midjourney gallery (web version). Somehow, they had no difficulty with this. And, being an advanced artificial intelligence tool that is constantly improving the user UX that will give maximum comfort of user experience for their users, they don't consider infinite scroll to be a terrible, unnecessary and wrong pattern, but instead use it in their interface along with other UX best practices.