google / cap-library

Common Alerting Protocol Library
Apache License 2.0
80 stars 30 forks source link

CapXMLParser throws SAX-Error while parsing alerts from NOAA #26

Closed sschiavoni closed 9 years ago

sschiavoni commented 9 years ago

Original issue 27 created by eliteSchaf on 2012-02-19T16:44:03.000Z:

Every CAP v1.1 Message released by NOAA couldn't be parsed. Following Code throws an exception:

URL url = new URL("http://alerts.weather.gov/cap/wwacapget.php?x=FL124C9A83022C.HighSurfAdvisory.124C9A85250CFL.TBWCFWTBW.d83d8b0c494fcc0bf8f67aafd571a291"); Alert alert = parser.parseFrom(new InputSource(url.openStream()));

Exception message: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Unexpected End of File. at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at com.google.publicalerts.cap.CapXmlParser.parseFrom(CapXmlParser.java:185) at Main.main(Main.java:24)

sschiavoni commented 9 years ago

Comment #1 originally posted by shakusa@google.com on 2012-03-02T14:23:28.000Z:

Hi,

I'd try:

Alert alert = parser.parseFrom(new InputSource( new BufferedInputStream(url.openStream())));

Arguably this is something that the library could do for you, and at the very least it should be documented, so I'll leave this bug open.

sschiavoni commented 9 years ago

Comment #2 originally posted by eliteSchaf on 2012-03-03T17:07:37.000Z:

Thanks for the reply.

When I use the code you've posted, I get a different Exception:

Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Resetting to invalid mark at com.google.publicalerts.cap.CapXmlParser.parseFrom(CapXmlParser.java:187)

Code: URL url = new URL("http://alerts.weather.gov/cap/wwacapget.php?x=AK124C9F851328.WindChillAdvisory.124C9F92C310AK.AFGWSWWCZ.8ddefc8e2ca79f53141ed450eaa37a50");

        Alert alert = parser.parseFrom(new InputSource(new BufferedInputStream(url.openStream())));

I've tested the code with several different CAP-Alerts, same exception everytime.

sschiavoni commented 9 years ago

Comment #3 originally posted by eliteSchaf on 2012-03-03T17:08:11.000Z:

Forgot the second part of the exception:

Caused by: java.io.IOException: Resetting to invalid mark at java.io.BufferedInputStream.reset(Unknown Source) at com.google.publicalerts.cap.CapXmlParser.getXmlns(CapXmlParser.java:215) at com.google.publicalerts.cap.CapXmlParser.parseFrom(CapXmlParser.java:177) ... 1 more

sschiavoni commented 9 years ago

Comment #4 originally posted by shakusa@google.com on 2012-03-04T00:30:34.000Z:

The following should get you past what you see now:

URL url = new URL("http://alerts.weather.gov/cap/wwacapget.php?x=AK124C9F851328.WindChillAdv\

isory.124C9F92C310AK.AFGWSWWCZ.8ddefc8e2ca79f53141ed450eaa37a50");

BufferedInputStream bis = new BufferedInputStream(url.openStream());
bis.mark(1000);

CapXmlParser parser = new CapXmlParser(true);
try {
  Alert alert = parser.parseFrom(new InputSource(bis));
} catch (CapException expected) {
    // See below.
}

However, NOAA's CAP 1.1 alerts are not valid CAP (try to paste one into http://cap-validator.appspot.com). If you find an active alert (the one in this example may be expired), you will see an error because

in the alert is not valid. Right now, the only way to work around that problem is to read the alert into a string (see http://stackoverflow.com/questions/309424/in-java-how-do-i-read-convert-an-inputstream-to-a-string), strip out that bad element, and then parse it.

I know that's very unsatisfying. We're prepping a release 2 of the library that, among other things, will allow you to parse an alert with errors in a best-effort fashion and allow you to handle or ignore the errors as appropriate. But that's still a few weeks away.

As for this bug, I'll look into how to avoid requiring the caller wrapping in BufferedInputStream and calling mark(). Thanks for the report!

sschiavoni commented 9 years ago

Comment #5 originally posted by eliteSchaf on 2012-03-04T11:38:14.000Z:

Thanks for the reply.

Setting the readLimit works :)

http://edis.oes.ca.gov/index.atom <- Can you confirm that those are valid alerts?

sschiavoni commented 9 years ago

Comment #6 originally posted by shakusa@google.com on 2012-03-04T18:43:01.000Z:

Yes, you can enter "http://edis.oes.ca.gov/index.atom" in the text box at cap-validator.appspot.com to see for yourself. That site runs via the library you are trying to use and the source is at http://code.google.com/p/cap-library/source/browse/#hg%2Fvalidator%2Fsrc%2Fcom%2Fgoogle%2Fpublicalerts%2Fcap%2Fvalidator

sschiavoni commented 9 years ago

Comment #7 originally posted by eliteSchaf on 2012-03-07T20:34:13.000Z:

Nice, I thought that the cap-validator can just validate single alerts :)

Thanks for your help

sschiavoni commented 9 years ago

Comment #8 originally posted by shakusa@google.com on 2012-04-30T18:19:15.000Z:

This is fixed now. You no longer need to buffer your input stream or mark it; the most recent version of the library should handle that for you.