Optimize Polling Logic - Githubissues

AaronVasquez commented 5 years ago

This will become inaccurate if a user comes back to check on an upload's progress. Polling starts from scratch every time the pages is loaded.

There is also a delay of 4s between each poll, which might incorrectly show less progress.

Not sure what the best solution is, but at the very least, we can store something on the client to remember the last index polled so we don't start from the beginning each time.
Instead of regular 4s intervals for polling, we can immediately poll again if we get a successful poll, and only delay if it's not successful. This doesn't solve the problem, but makes polling "faster"

rfornea commented 5 years ago

If we wanted, we could poll on all the polling indexes each time, instead of working inwards just doing 2 at once. For each index we find, we remove it from the list so we don't poll on it again for the next poll. We can also tinker with how many indexes we pick for each file.

AaronVasquez commented 5 years ago

We can probably do that the first time we poll, then figure out where we need to work inwards from.

rfornea commented 5 years ago

Yeah. For a file that's been uploading for a while, the first polling attempt will knock off a lot of the indexes.

EdmundMai commented 5 years ago

We could probably just poll from a spread of the indexes (up to the max index), and use the max that is returned to determine where to start

EdmundMai commented 5 years ago

I think saving the last successfully polled index would be easiest, but if we eventually want to support resuming from different browsers then that wouldn't work.

EdmundMai commented 5 years ago

Per our convo:

Similar to what I wrote above, I think we should take advantage of having the ability to query multiple indexes at once.

Currently we query single indexes incrementally, which is why increasing the frequency would work since it's successive (poll idx 1, then idx 2, then idx 3, etc). Instead, I propose querying a spread, and using the amount of returned values to determine what the progress % is.

Example: a file has 20 chunks, so we query [0, 1, 2, ..., 20]. If only [0, 1] exist, then we mark the progress as 10% (2 values returned / 20 total values). The progress moves in increments of 5% < n < 100% depending on file size. We can easily configure this to be something we feel is both user friendly + not burdensome to the tangle. As Rebel suggested, we can also even optimize it to remove the values we have already found to decrease redundancy in our queries.

I think this is better than making n HTTP requests to the Tangle, but it will be slightly more complex.

As an easy win let's just increase frequency first, since this is an optimization that would take time to implement.

oysterprotocol / oyster-streamable

Optimize Polling Logic #39