rectangle-dbmi / Realtime-Port-Authority

Realtime transit tracker of Pittsburgh's Port Authority buses using the realtime PAT API using Google Maps to Display the Maps
GNU General Public License v3.0
60 stars 33 forks source link

Error messages are transient #221

Open sgdoerfler opened 9 years ago

sgdoerfler commented 9 years ago

When True Time has an outage, the app displays a series of messages like "61C is not currently being tracked." Each goes away after a second or two. Sometimes I'm not looking at my tablet continuously, since I know the app will take a few seconds to get and display the info I want, so I don't see the messages. Or I see just the last one, and I don't know if it's telling me one route is now done for the night, or that it can't get any data at all. Sometimes there's a connectivity problem, but I'm not looking at the tablet during the specific 2 seconds where the app says so. It would be more convenient if these messages weren't completely ephemeral.

I'd like to see some permanent indication on the screen whenever there's a problem (perhaps the familiar yellow !-in-triangle icon on the black bar at the top, or in the corner of the map, or even some graphical effect on all the affected routes, like making them dashed lines).

Then that problem indication, when tapped, would display a simple dialog with a complete list of all selected routes with no tracking data. "Not being tracked: 61A 61B 61C 61D". Or "Please check your data connection" if that's the issue.

epicstar commented 9 years ago

@VincentIII

epicstar commented 9 years ago

This is a really good idea. It's been a plan to change the error message... However, Port Authority gives us the same error message whether or not their backend is down. Their official site handles it the same way we do right now..... It'll change

epicstar commented 9 years ago

This is part of issue #215

sgdoerfler commented 9 years ago

I think there are two issues here: A. How each error message is shown (message that appears and fades, versus collecting them in a list and displaying them all when user taps something). B. What the error messages say.

As far as the latter issue, I think the useful distinctions that can probably be made, without any help from Port Authority, are:

  1. No connectivity at all (Now reported as "Please check your data connection.")
  2. Device says there's connectivity, but we can't establish a connection to the particular server we want.
  3. We seem to be connected to the server, but got some unexpected response we can't interpret, distinct from PAT's "not tracking that route" error message. (Could mean we're on some Wi-Fi connection that's asking us to log in, or that PAT's server is having issues, or some cache or proxy along the way is gumming up the works.) 4.Port Authority says it's not tracking a route (but doesn't say why).

I know there are specific messages for 1 and 4. I'm not sure if there are distinct messages for 2 and 3 (or if they just reuse the messages from 1 or 4), but it would be useful if there were.

epicstar commented 9 years ago

The explicit messages for 1 is about to get harder....... Retrofit has eliminated RetrofitError and we're going to have to handle errors manually. I went the lazy way out for messages 2 and 3. There is another error message for API calls exceeded handled, but I think this will be handled better. There are cases in number 3 where they don't know what the error is, they will also give us a generic "not sure what happened...." message.

As for number 4.......

I figured out that each part... -- vehicles, predictions, route lines, etc. -- of the API work on different server hardware and we just connect to a central server. When any of these servers go out, those individual servers are manually reset (there was a time when the API went down late Friday and wasn't fixed until early Monday.....).

There is a big unfortunate for case 4..... Port Authority only gives us 1 message for whether the bus isn't tracking or their servers are down when we use the getvehicles GETTER request for their API. Usually when the server is down (which is pretty much 4x a week), this means that we can connect to the server, but it will give us this message: "No data found for parameter" (then proceeds to give us the rt number). They will give this message to us 1 at a time for each route (if 10 routes are selected, they will give us that message 10 times and link it to the rt number). This same message will pop up if the bus isn't tracking for the rest of the day.....

If Port Authority would want to make things easy for us, they should give us the error message: "TruTime Vehicle Server currently down" or something of the like. Unforunately for us.... like said above, they give us that stupid generic message. I transformed the generic description to be as descriptive and truthful as possible, but obviously when people read it, they still don't know what is going on. To get better description for why the bus isn't showing up, we will have to do this portion manually... as in, the server says the generic message, and the app needs the logic to translate what that message means.

And this logic is going to be pretty big task IMO. Any time we have a false positive or false negative for the message, we have failed the user...... Hence why I am in favor of just using the generic message. However, from what I know (and based on the reviews), every user and person involved in the project disagrees with me. And of course, the user always beats the people developing the app.

One way @VincentIII and I could do it is by looking to see what times the buses are running, but if people are only selecting 1+ bus, especially one that isn't well tracked by PAT, we have to account for that. It will be very easy to make a false positive if we only use this case.

Or we can poll what buses are always running (the 28X -- I think -- runs from like 5AM-3AM). When all selected buses are down, we will run another call to buses like the 28X to see if it runs. This is much less timeconsuming to code but this will add way more API calls for each person that tries to use the app.

Another option is just to unselect every route that is not currently running, but I think that isn't good unless users are ok with losing the previously selected routes not running (I'll have to change the code... and add additional logic... I'm not ok with losing my last selected routes that I personally selected).

What other things can we look at?

sgdoerfler commented 9 years ago

Breaking down case 4 into subcases seems like it would be useful, and checking on a bus like 28X is a clever way of guessing the nature of the problem at PAT. But several buses run later than 28X, which ends every day at 1 AM, such as 61C and 61D (which runs until almost 3 AM on Saturday).

Personally, I think it's fine to just show the errors from PAT, if you get to case 4. When PAT says it's not currently tracking 61A, 61B, 61C and 61D, I can figure out for myself that TrueTime is busted. I'd just like the app to display all those errors in a more convenient way, not transiently. Using heuristics to guess why there's no TrueTime data isn't a priority for me.

Please don't skip tracking the 61A I selected just because the schedule says it's not supposed to be running at this hour. If my last bus home is running very late, I really want to know that! But it would be OK to display some different message, distinguishing "No tracking for the P10, as expected, because it's Sunday" from "No tracking for the 61A, but there should be" by using scheduling info.

VincentIII commented 9 years ago

I believe the problem with calling out to the API to see if a bus is running or not before displaying a corrected error message is that we are effectively wasting an API call, which could lead to the app hitting the API limit faster than it already does. Not sure if its possible, but finding out if there is any other "slightly-related" services that run side-by-side to the bus tracking, so if that's down, then in theory the bus tracking server is down.

For splitting of error messages, I completely agree, and as @epicstar can vouch is something I've been bringing up often in talks. It is something that honestly I believe makes up for majority of the negative feedback, which is that nobody has an idea what is truly wrong when something breaks, and blames the app instead of the possibility of the server being down or no busses currently running.

With the idea of not displaying busses when they are outside of schedule, This would be an optional (maybe opt-in instead of opt-out?) We are in the process of planning an options pane that will allow users to customize and optimize the selection of busses and presets. We plan on using a +/-1 hour offset with the filter, as like you said, the last bus can (and normally) runs very late. I do like that idea of using the bus schedule with-in the error message logic to tell the difference between if the bus never runs that time/day or if there should be a bus running and further investigation is needed to find out why its not being tracked.

sgdoerfler commented 9 years ago

Suppose someone wrote a simple web server script (on some web server somewhere) whose only job was to report if TrueTime was up. It would use a different API Key than the app. Once a minute, it would make a single request (for 5 long-running routes, say, just to be confident in saying TrueTime is really down -- I think that still counts as one request). That's only 1440 requests a day, well under PAT's limit.

Then the script would save that info (just a thumbs-up or thumbs-down, not the actual tracking info that came back) in a simple XML file that the app could then retrieve via a URL, whenever it had any trouble getting TrueTime info from PAT for a particular route.

If you wanted to get fancy, it wouldn't be very difficult to have the script keep a record of uptime and downtime, since it's getting the info anyway. Then it could return an XML file with more specific info, like "TrueTime has been down since 1:23 PM, 17 minutes ago" or "TrueTime has been up for 16 hours." It could even log downtime info for use in making a nice table, and stick it on a "TrueTime Downtime History" web page.

epicstar commented 9 years ago

@sgdoerfler - good news for you. I am combining like-messages. Expect this to be pushed to the beta channel either today or tomorrow. This will come with another big backend change since I have figured out how to actually use RxAndroid.

It's not exactly the thing that @VincentIII wants with the "the buses may possibly be down" message, but this is what you want, and much better than what it is right now. Thank you @mikeantonacci for pair programming with me on the bus vehicle module...

epicstar commented 9 years ago

@sgdoerfler - your request is officially in the beta release:

https://github.com/rectangle-dbmi/Realtime-Port-Authority/releases/tag/5.0

sgdoerfler commented 9 years ago

Looks good, thanks! (Though I should point out that what's there now only addresses a part of what I wanted. Mostly I was hoping for a non-transient error indication, and the improved message is still transient.)

epicstar commented 9 years ago

Acknowledged. I'll talk over details with @VincentIII.

As for error messages being in a list as opposed to being transient... I need suggestions since as far as I'm concerned, it's against Android Guidelines to have different errors pop up at the same time, and each Snackbar/Toast should have one line. Maybe it's better off having a toast appear and use notifications?