web3-storage / web3.storage

DEPRECATED ⁂ The simple file storage service for IPFS & Filecoin
https://web3.storage
Other
503 stars 119 forks source link

File load & pagination issues for user with many files #1362

Closed mbommerez closed 2 years ago

mbommerez commented 2 years ago

User observations

Acceptance criteria

Notes

There are 2 parts to this:

https://images.zenhubusercontent.com/378893240/704a835d-8695-4f22-853d-85ad199a1058/screen_recording_2022_03_21_at_10_38_42_720p.mp4

joshJarr commented 2 years ago

Done some initial investigation on this, there's a few ways to approach this task and am keen to get some feedback before tackling it!

Look like right now 1000 results are requested by the frontend, Pagination, searching and sorting are completed on the Frontend.

If we want to keep all "sort by" and "search" functionality, we would have to replicate this in the API.

Pagination

Sorting

Searching

I could fix this all in a full fat solution in order keep parity with existing functionality, or I could implement an MVP hotfix for pagination using mainly existing API features but resulting in losing the sorting for file size and name.

Would love to hear your thoughts on this @dchoi27, @alanshaw & @orvn!

orvn commented 2 years ago

Look like right now 1000 results are requested by the frontend, Pagination, searching and sorting are completed on the Frontend.

Thanks for this @joshJarr! Yes, I agree, this is far from ideal and was really just implemented as a stopgap. It definitely needs to be replaced because:

  1. Response will be slow for large queries
  2. Once the uploads list is > 1000 objects, the search won't even address the entire collection

Looks like the API already supports sorting by Date and Name, meaning we'd also have to add sorting by size to this.

Since there's a split table design (not yet in production), where we show files that have been uploaded via the Pinning Services API separately, I think these features (sort, size, offset, etc.) should be applied to the /pins endpoint as well?

We could use the existing before and after parameters to paginate through the API. I'm not a huge fan of this approach since it means we cannot sort by size or name since we'd be chunking the uploads by date.

Agree as well, +1 to using offset and size in place of before and after dates.

I find searching a little odd, we currently allow searching by CID, but would a user ever search for results using a partial CID?

Most of the time, if my file is named correctly/semantically, I'll search by file name. But I'll also search by a fragment of a CID (to reliably find a specific file in a really large list).

joshJarr commented 2 years ago

Thanks @orvn! I've got a WIP PR for these changes: https://github.com/web3-storage/web3.storage/pull/1408

Since there's a split table design (not yet in production), where we show files that have been uploaded via the Pinning Services API separately, I think these features (sort, size, offset, etc.) should be applied to the /pins endpoint as well?

That's a great shout. I think we can perhaps make a new ticket for applying this logic to the /pins endpoint. I'd love to see the new split design to see how best to do this

Agree as well, +1 to using offset and size in place of before and after dates.

Cool, I've implemented this in my WIP PR, currently weighing up using an offset vs page, as I think the latter reduces the effort of calculating page & page size on the frontend.

Most of the time, if my file is named correctly/semantically, I'll search by file name. But I'll also search by a fragment of a CID (to reliably find a specific file in a really large list).

Cool - this is great to know. I may move adding a search endpoint/filtering by query to a new ticket. This could mean temporarily removing this functionality unless we block this ticket from being released until the search is implemented.

I also may move sorting by size to a new ticket. Currently the file size is not stored on the upload entity, so we may need to add this column via migration as it is currently denormalised on request, so I cannot sort the uploads by this efficiently. Keen to know the importance of sort by size and if this too can temporarily be removed to keep this fix small.

vasco-santos commented 2 years ago

hey @joshJarr

Thanks for going through these issues. The historical reason for no pages appearing is that this was not possible with FaunaDB (DB we were using before). Now we can add that if it is important.

The offset here will map to use https://supabase.com/docs/reference/dart/range#examples with the page, is it correct? I would call it size and page, given we already name like this in other APIs and prefer to keep things consistent.

joshJarr commented 2 years ago

Hey @vasco-santos, thanks for the context. Yep have it currently mapping to a .range in my PR. Yep that's a great point around offset vs page, I'll drop offset in favour for page and use page numbers instead of calculating the offset in the client. This approach keeps it more consistent and easier to integrate with. Thanks!

JeffLowe commented 2 years ago

@joshJarr @mbommerez Just let me or @drewdelano know once the pagination updates to the api lands so we can update the frontend to use it.

mbommerez commented 2 years ago

@JeffLowe Thanks, we are reviewing this internally and get a review from PL next and then get this ready for the next API release.

JeffLowe commented 2 years ago

To clarify, this ticket's scope should be on pagination for the upload api. The utilization of the api by the frontend is ticketed in #1501

joshJarr commented 2 years ago

Thanks, will do @JeffLowe.

@vasco-santos I've requested your review on the PR, would be great to get your input and ensure it's consistent with existing functionality before merging. Many thanks!

joshJarr commented 2 years ago

Hey @JeffLowe @drewdelano Just an FYI, the API pagination has been merged and deployed! There's a minor chore to update the HTTP documentation with the new params, that's awaiting a website deploy.

Pagination For now, if you add the page=1 param to the GET /user/uploads you should see a paginated response. You can still set page size with the size param.

I've added pagination metadata to the API response headers that you should be able to use to calculate page numbers and such. These are:

Sorting Also sorting can be achieved by adding sortBy param, this supports 'Date' | 'Name' for now. sortOrder can be used to order the sort, accepting 'Asc' | 'Desc'.

When running the Web documentation locally, there should be more info there.

Let me know if you have any questions or feedback!

JeffLowe commented 2 years ago

Thank you so much for this into @joshJarr !!

mbommerez commented 2 years ago

Reassigning to @joshJarr. PR #1699 needs completing and review.