Review dialogue for collecting/harvesting feed on Feed Manager UI

davidjennings commented 12 years ago

On the Feed Manager home page http://coursedata.k-int.com/FeedManager/home/index the Collect column has a button in it. Clicking this button does not seem (in our experience) actually to collect the data from the feed, but always generates a message "Feed x will be collected in next run". If this is always the case, a) would it be more accurate to rename this column "add to queue for collection" (accepting this is much wordier and therefore not ideal)? b) if there is a method to collect the data immediately, should users be made aware of it? (from the context of use, we are imagining that users will want to dip into this part of the system, check their data and move on; this won't be the kind of task they'll want to keep open, in a browser tab and on their to do list, indefinitely -- so they may have limited patience and want to tweak their data, recheck it and then finish)
It seems that it may be possible for the user to collect data immediately by first clicking on the name of the feed, and then clicking Edit, which allows you to change the Check Interval. Changing this to 0 days, 0 hours, 0 minutes (or 1 minute, if 0 would trigger continuous checking) should enable you to recheck the data quickly. If we do this, we get a message "Datafeed [sic] x updated" but is the system actually going to check (i.e. harvest/collect) the feed in next x minutes (I tried setting x = 0, 1, 15)? There is no feedback to the user as to whether a check has actually been attempted or taken place. I could not seem to triggers a successful check (/harvest/collection?) because the Home screen always said "Never" under Last Check. Users need some feedback at this stage of the dialogue. The message "This feed has not been harvested yet. If this is a newly registered feed it should be picked up within the next 5 minutes. If this is an old feed you should check for errors which might be preventing the harvest process" is inadequate as feedback: once 5 minutes have elapsed, users will assume that there are errors that have prevented harvesting (might there be any other explanation why my feed is still Never checked?) But how and where should they check for errors? is the feed aggregator not giving any feedback about the kinds of errors that it has encountered?

At the moment there seems to be a high chance of this dialogue leaving users stuck and without the necessary information to enable them to get themselves unstuck. I recognise that the above account may contain some misunderstandings of what the actual process is -- but if that's the case, I think the misunderstanding are a symptom of how confusing the dialogue is.

ianibo commented 12 years ago

The problem with (1) is that the speed issue is related to the host we are collecting the data from.. and it can take a long time... Which means doing what you suggest about "Immediate" simply defers the problem to how do we write a HTML interface to represent a long-running transaction which might never terminate. So, yes "Add to queue" is perhaps more accurate, but isn't actually accurate, in that the real problem here is that the system to system interface is potentially long running and not particularly compatible with a HTML user session (Given for example, the OU course catalog)

I'll look at 2.. most of the issues around here seem to come from a perspective that harvesting a feed is an atomic transaction that can be completed almost instantly. The problem is that this isn't the case. The most common case is that the harvest will take 1-2 mins, and the edge cases can run to hours.

Because the checking is a background task, we don't really know when the next harvest will be, so it's hard to say "Will be harvested in HH:MM:SS.. this is because we have to run the feeds in series and due to the indeterminate time it takes to run each feed we can say "The next harvest will be in 5 mins" but if a particular job is 5th in the queue, and jobs 1,2,3 and 4 take 5 hours, Job 5 will run in 5 hours and 5 mins..

There are 2 potential solutions here.. (1) change the hardware and networking requirements so we can run all jobs in parallel. This will let us tell users when jobs will start, although the end time is still entirely dependent upon the system we are harvesting from. (2) Figure out the right words to explain this to users.

Maybe theres a middle ground, but thats the problem as I see it from this end.

davidjennings commented 12 years ago

This is all helpful explanation. The main point seems to be that if "users will want to dip into this part of the system, check their data and move on", as I suggested, the hard truth is that they won't be able to for sound technical reasons. Fair enough.

Bearing in mind that uncertainties remain in solution (1), I suspect solution (2) may be the only really practical one.

If there are uncertainties such as the variation between 1-2 mins and 5+ hours, then the dialogue with the users has to prepare them for this, rather than just leaving them to find out the hard way. Need to be careful about expectations set by statements like "it should be picked up within the next 5 minutes"; old adage about better to under promise and over deliver. So, in this case, a warning message along lines of "This step depends on systems and data that are not part of this service, so it can take several hours (though in most cases it is completed in a matter of minutes".

Users need to be given the feedback and info to determine things like (a) if there are errors that have prevented harvesting (b) how and where should they check for and diagnose errors (c) what feedback is available about the kinds of errors

The best way to communicate this is probably not via one long manual-style explanation, but through providing relevant prompts and guidance at each stage of the process...

ianibo commented 12 years ago

Could use some specific suggestions here.. not sure how to improve things..

neilsmi commented 12 years ago

I think we're agreed that the only way forward in the short term is to improve the feedback to users. I suggest:that the message displayed to users after the 'collect' button has been pressed AND the message displayed when the feed interval is changed should read:

" Feed x has been added to the queue for collection. This step depends on systems and data that are not part of the aggregation service, so it can take several hours (though in most cases it is completed in a matter of minutes)".

davidjennings commented 12 years ago

Yes, I'd agree with that, with first a minor alteration "Feed [or data feed if that is the term being used elsewhere] x has been added to the queue for collection. This step depends on systems and data that are not part of the Aggregator [for consistency and clarity - some users might not know what "aggregation service" is, but it says Aggregator in big letters at the top of the page], so it can take several hours (though in most cases it is completed in a matter of minutes)"."

But, second, could we add a link to a pop-up FAQ or similar e.g.

How frequently are data feeds collected? We run collection requests in the order we receive them. When several users are making requests at once, this may mean that you have to wait until other users' collection requests have been completed. However, in many cases your request may be initiated almost immediately.

How long does collection take? Because this depends on factors that we can't anticipate in advance of your request (such as the amount of data and data transfer speeds) we cannot give a firm estimate of this. Exceptionally it may take hours, but more often it will be a matter of a few minutes.

How will I know when an attempt has been made to collect my data feed? The time (and date) of the most recent collection of your data feed will be shown on the home page under "Last Check"

How will I know when collection of my data feed has been successful? [I'm not sure what the correct answer to this is]

Any thoughts about changing the "collect" button to "request collection" or "add to queue for collection"? As with all these points, in the absence of being able to observe or talk to real users trying out the aggregator, I am not wedded to a particular solution. I'm just trying to anticipate areas where they might get confused or stuck, and this seemed like one such area to me - my suggestions are attempts to make the workings more transparent and thereby give users more agency in understanding what's going on and what they can do...

k-int / XCRI-Aggregator

Review dialogue for collecting/harvesting feed on Feed Manager UI #33