researchstudio-sat / webofneeds

Finding people to cooperate with. Protocol, not platform. Decentralized. Linked Data. Open Source.
http://researchstudio-sat.github.io/webofneeds/
Apache License 2.0
62 stars 20 forks source link

Load data selectively #623

Closed fkleedorfer closed 6 years ago

fkleedorfer commented 8 years ago

Instead of loading everything on startup, determine what needs to be loaded and load only that.

pheara commented 8 years ago

I think we should really break this down in seperate tasks and merge work on them into a feat_load_selectively branch.

A breakdown of steps might be as follows:

  1. Chat messages: only load when the connection is opened.
  2. Chat messages:
    • only load latest 10 messages
    • "see more"-action
    • spinning-wheel while it's loading
  3. in the list-views: load matches/requests/conversations of a post only when it's folded-out
  4. in the list-views:
    • only load the first 10
    • "see more matches/requests/conversations"-action
    • spinning-wheel while it's loading
  5. in the grid/tiles view for matches:
    • only load the 10 matches that are the latest across all needs (will need implementation on the owner-server as the node doesn't know which needs belong to the same user).
    • "see more"-action
    • spinning-wheel while it's loading
  6. post-overview
    • only load 10 posts with the newest updates (where unread(?) messages trump unread(?) requests trump matches). This probably will need to be implemented on the owner-server as the node doesn't know which needs belong to the same user.
    • "See more posts"-action
    • spinning-wheel while it's loading
  7. feed:
    • load 5 posts with the latest events – and 3 of the latter for each – with prioritization/sorting as specified above
    • "see more events"-action for a given post
    • "see more posts"-action
    • spinning-wheels in all of the above places while they're loading

These intereactions should be introduced as new action-creators that load the data and then dispatch new actions of those new action-types. Some of these actions are triggered by clicks on the "more"-links, some by scrolling (e.g. in the feed)

A hook in or – if the former is not possible – wrapper for the routing-change/stateGo-action-creator would probably be the best place to trigger the loading for the first items visible in a given view. A third option would be to have an agent, that looks at the route, checks if these first items are present and if they aren't, dispatches via the respective action-creator.

The spinning wheels can be created by tagging the ownNeeds/post/connection-objects in the state with a isFetchingMore boolean – which also would be a better way to handle the "Pending…" in create-post.

pheara commented 8 years ago

The root-level items in the list above should be seperate pull requests.

pheara commented 8 years ago

The bottle-neck atm: the crawling on the server: deep requests

counterparts and events that aren't included in the deep request: the rest

pheara commented 8 years ago

Two different goals:

pheara commented 8 years ago

[in progress] Search for solutions:

pheara commented 8 years ago

Other thoughts:

pheara commented 7 years ago

It probably would make sense to start with the feed as pain-points will most likely show up there first anyway.

pheara commented 7 years ago

Documentation for our linkeddata-paging API

pheara commented 7 years ago

749 introduces the necessary code in won.fetch. However several parameters don't work yet (e.g. deep for event-containers, type, timeof), but at least they can be more easily tested now.

Example usage that fetches page 2 with pages of seven events each:

won.fetch(eventContainerUri, { 
  requesterWebId: reqWebId, 
  pagingSize: 7, 
  queryParams: { p: 2, deep: true }
})
.then(args => {
  const uris = args['@graph'][0]['rdfs:member'].map(e => e['@id']);
  console.log(uris)
})
pheara commented 7 years ago

Discussion Notes

The notes I took while discussing this issue with @fkleedorfer:

Misc

most difficult cases "only load 10 posts with the newest updates" and "last 10 matches over all needs"

owner has to cache need info (e.g. what has been seen, when the last updates of which type has happened, etc) necessary to implement "only load 10 posts with the newest updates".

does this ultimatively require the owner to cache the entire node-state for that user? how to avoid huge startup-cost, i.e. loading needs for every user? don't load data for users that haven't logged in for a while? only load delta since last poll?

client-to-server = marking things as seen: POST list of eventUris server-to-client: bloom-filter? list of unread uris (might be long)? list of latest unread + aggregated numbers? a special message if another client-instance marks the event as seen?

When previous requests have been finished and no new action triggered, the client could load preemptively -- e.g. load connection with new unread events (as they might grab the users attention) or after the list of conversations has finished loading, start fetching the first few conversations. We just need to make sure, user-triggered actions always have precedence. The agent can publish multiple actions for each of these loaded data-packages.

Most of the loading should happen through the more declarative construct queries!!

always use deep=true to resolve collections

Note that "the last 10 events" (N) isn't the same as "the last 10 events the user gets to see" (N'), the difference for example being success messages. Until the server-API reflects this issue, N = 3 * N' can be used.

architectural approaches on the client:

Caching

the events received via the websocket should be pushed to the rdf-store, including rdfs:member entries, and they should be marked as cached.

if dirty: only load beginning with latest member automatically to avoid unnecessary server-load.

selective loading and caching:

But actually we shouldn't need this smart caching as we get all necessary information through the web-socket. Everything should be either initially fetched at page-load or in case the user clicks "more" be connections/events older than the previously oldest uri – thus a cache-miss anyway. We could implement checks as the ones above to detect non-well-behaved code though. The only exception is connection-messages we send ourselves, that we need to invalidate and fetch after posting to get the timestamps from our owner. So the simplified variant is: "always reload if it's a mutable ressource (i.e. collection)"

pheara commented 7 years ago

Note: the won.deleteNode(uri) somewhere around linkeddata-service-won.js:~790 might be problematic, as in it's current form it deletes the entire container when fetching a new page.

It's ignored if the data loaded was a partial only, i.e. paging was used. Thankfully the store doesn't add duplicate nodes, so we simply add the triples the usual way. The only possible tripping hazard is blank nodes. These are always given unique identifiers and thus always result in unique triples.

pheara commented 7 years ago

Here's an example redux-app that uses pagination (on github-API content): https://github.com/reactjs/redux/tree/master/examples/real-world

pheara commented 7 years ago

The following relay-example on paging might provide inspiration on how to embed crawlable queries in components: https://www.reindex.io/blog/redux-and-relay/#relay-4

pheara commented 7 years ago

For the decisions:

The Elm-Architecture would need too much refactoring right now or introduce another style and thus increase the complexity of the code-base. Implementing it as an agent would require to specify the data-dependencies twice or traverse the currently visible component tree on every update. Also it can only look at the state, in particular the routing parameters, not actions (e.g. a person clicking on "Show more" or scrolling down. Thus the loading will continue to happen in asynchronous action-creators. But instead of having one automatic call as part of the page-load, the components will call some kind of requestData-action-creator every time (additional) data is required, e.g. while the component is initialized, a critical routing parameter changes or when a person requests more data. These action-creators should be kept as small as possible, though!

Uris will not be marked as cached if they were only fetched partially, i.e. using pagination – even if multiple partial fetches should cover the entirety of a collection.

Components will now not only know what data they need from and where it is in the state (encoded in the select-statements they use) but also where they data is in on the server and in the rdf-graph. This information will be provided to the action-creator mentioned above. Ideally it's encoded purely declaratively and both the crawlableQuery and the select-statement can be drawn from that info

The order of operations is:

  1. components @constructor
  2. that calls actionCreators.ensureLoaded(<dependencies>)
  3. that calls:
    1. executeCrawlableQuery(<query>).then(data => {… dispatch(<dataRequestedAction>) …}
  4. state update
  5. component's select, that is drawn from the components dependency declaration.
pheara commented 7 years ago

782 Spinners!

pheara commented 7 years ago

There's a spaghetti-code snippet in the branch, that makes sure all events of the connection are loaded whenever the conversation is accessed (I'll refactor it to conform to the architecture specified above, once all edge-cases have been found).

One of these problems: Unless all queries to a connection pass along the paging-size, all event-uris get loaded into the rdfstore. Followingly, all queries that go on to load all referenced events will then load all events. Even if a paging-size was specified for the query. This will be a tough one to avoid:

pheara commented 7 years ago

The connectionMessages gotten via the websocket should be added to the rdf-store as well (including a rdfs:member triple)

quasarchimaere commented 6 years ago

pretty sure we can close this, since we implemented some of that with the skeleton screens and so on, i think this issue is obsolete now, or should be boiled down to something more specific (as its own issue) @fkleedorfer if you agree please close this issue

quasarchimaere commented 6 years ago

closing this, if we need to do more we will create separate issues for it