rometools / rome

Java library for RSS and Atom feeds
https://rometools.github.io/rome
Apache License 2.0
905 stars 168 forks source link

Receiving the following exception Using Rome Tools 1.12.0 #687

Open luith2000 opened 9 months ago

luith2000 commented 9 months ago

Caused by: com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 11: Attribute name "defer" associated with an element type "script" must be followed by the ' = ' character. at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:236) ~[rome-1.12.0.jar:1.12.0]

antoniosanct commented 9 months ago

Hi, @luith2000

What is the origin URL source? On the other hand, you're using a '18 ROME version. Could you try to execute on an updated version, by example 1.19 or 2.1.0 (actual release)?

Regards, Antonio.

luith2000 commented 9 months ago

Hi Antonio,

After updating to 2.1.0, still receiving the following error:

Caused by: com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on lin e 11: Attribute name "defer" associated with an element type "script" must be fo llowed by the ' = ' character. at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:236) ~[r ome-2.1.0.jar:2.1.0] at com.rometools.rome.io.SyndFeedInput.build(SyndFeedInput.java:150) ~[r ome-2.1.0.jar:2.1.0] at edu.rutgers.enterprise.portal.webscraperrss.service.RssScraperService .saveLastModifiedDate(RssScraperService.java:79) ~[classes/:?] at edu.rutgers.enterprise.portal.webscraperrss.WebScraperRssFeedGenerato rApplication.saveLastModifiedDate(WebScraperRssFeedGeneratorApplication.java:51) ~[classes/:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0 _392] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:62) ~[?:1.8.0_392] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) ~[?:1.8.0_392] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_392] at org.springframework.context.event.ApplicationListenerMethodAdapter.do Invoke(ApplicationListenerMethodAdapter.java:261) ~[spring-context-5.1.7.RELEASE .jar:5.1.7.RELEASE] ... 26 more Caused by: org.jdom2.input.JDOMParseException: Error on line 11: Attribute name "defer" associated with an element type "script" must be followed by the ' = ' c haracter. at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:232) ~[jdom2-2.0.6.jar:?] at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:303) ~[jdom2-2.0.6.jar:?] at org.jdom2.input.SAXBuilder.build(SAXBuilder.java:1196) ~[jdom2-2.0.6. jar:2.0.6] at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:233) ~[r ome-2.1.0.jar:2.1.0] at com.rometools.rome.io.SyndFeedInput.build(SyndFeedInput.java:150) ~[r ome-2.1.0.jar:2.1.0] at edu.rutgers.enterprise.portal.webscraperrss.service.RssScraperService .saveLastModifiedDate(RssScraperService.java:79) ~[classes/:?] at edu.rutgers.enterprise.portal.webscraperrss.WebScraperRssFeedGenerato rApplication.saveLastModifiedDate(WebScraperRssFeedGeneratorApplication.java:51) ~[classes/:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0 _392] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:62) ~[?:1.8.0_392] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) ~[?:1.8.0_392] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_392] at org.springframework.context.event.ApplicationListenerMethodAdapter.do Invoke(ApplicationListenerMethodAdapter.java:261) ~[spring-context-5.1.7.RELEASE .jar:5.1.7.RELEASE] ... 26 more Caused by: org.xml.sax.SAXParseException: Attribute name "defer" associated with an element type "script" must be followed by the ' = ' character. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAX ParseException(ErrorHandlerWrapper.java:204) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalErro r(ErrorHandlerWrapper.java:178) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError( XMLErrorReporter.java:399) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError( XMLErrorReporter.java:326) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(X MLScanner.java:1466) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scan Attribute(XMLNSDocumentScannerImpl.java:413) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scan StartElement(XMLNSDocumentScannerImpl.java:250) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImp l$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2783) ~[?:1.8.0 _392] at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(X MLDocumentScannerImpl.java:601) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next (XMLNSDocumentScannerImpl.java:112) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImp l.scanDocument(XMLDocumentFragmentScannerImpl.java:504) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(X ML11Configuration.java:841) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(X ML11Configuration.java:770) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser. java:141) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Ab stractSAXParser.java:1213) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.p arse(SAXParserImpl.java:642) ~[?:1.8.0_392] at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:217) ~[jdom2-2.0.6.jar:?] at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:303) ~[jdom2-2.0.6.jar:?] at org.jdom2.input.SAXBuilder.build(SAXBuilder.java:1196) ~[jdom2-2.0.6. jar:2.0.6] at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:233) ~[r ome-2.1.0.jar:2.1.0] at com.rometools.rome.io.SyndFeedInput.build(SyndFeedInput.java:150) ~[r ome-2.1.0.jar:2.1.0]

This is essentially the code snippet I'm using:

                URL feedSource = new URL(url);
                SyndFeedInput input = new SyndFeedInput();
                input.setAllowDoctypes(true);
                input.setPreserveWireFeed(true);
                SyndFeed feed = input.build(new XmlReader(feedSource));   <-- Exception

Regards,

antoniosanct commented 9 months ago

Hi, @luith2000 Sorry, you didn't put 'url' value which have the 'defer' tag. It's necessary to create a new unit test. Regards, Antonio.

luith2000 commented 9 months ago

Hi Antonio,

Sample URLs would be the following:

https://www.newark.rutgers.edu/news/feed https://news.camden.rutgers.edu/feed/ https://www.newark.rutgers.edu/feed https://rutgersnewarkathletics.com/rss.aspx https://it.rutgers.edu/feed/

Regards,

antoniosanct commented 9 months ago

Hi, @luith2000 The first 3 URLs are not RSS/Atom valid URLs (in my case, 1 and 3 are HTML page errors from site), then you can't parse it as a valid ROME synd feed. The last 2 URLs are RSS 2.0 valid URLs.

Edit: I was investigating about the 3 links and they are Drupal sites. You need to include rss.xml at the end of each link: https://www.newark.rutgers.edu/rss.xml https://camden.rutgers.edu/rss.xml

Regards, Antonio.

luith2000 commented 9 months ago

Hi Antonio,

Thank you again for checking. Was just curious if there is a common practice in ROME to check if an URL is a valid RSS endpoint?

Thanks.

antoniosanct commented 9 months ago

@luith2000 I think it is not the main purpose of ROME Library. Mainly you can create RSS/Atom feeds and also parse and valid RSS/Atom feeds from a valid source origin. Meanwhile, caughting a IO exception you can rarely valid a RSS/Atom feed, because of in normal situations you previously know the feed. Regards, Antonio.