Open luith2000 opened 9 months ago
Hi, @luith2000
What is the origin URL source? On the other hand, you're using a '18 ROME version. Could you try to execute on an updated version, by example 1.19 or 2.1.0 (actual release)?
Regards, Antonio.
Hi Antonio,
After updating to 2.1.0, still receiving the following error:
Caused by: com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on lin e 11: Attribute name "defer" associated with an element type "script" must be fo llowed by the ' = ' character. at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:236) ~[r ome-2.1.0.jar:2.1.0] at com.rometools.rome.io.SyndFeedInput.build(SyndFeedInput.java:150) ~[r ome-2.1.0.jar:2.1.0] at edu.rutgers.enterprise.portal.webscraperrss.service.RssScraperService .saveLastModifiedDate(RssScraperService.java:79) ~[classes/:?] at edu.rutgers.enterprise.portal.webscraperrss.WebScraperRssFeedGenerato rApplication.saveLastModifiedDate(WebScraperRssFeedGeneratorApplication.java:51) ~[classes/:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0 _392] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:62) ~[?:1.8.0_392] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) ~[?:1.8.0_392] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_392] at org.springframework.context.event.ApplicationListenerMethodAdapter.do Invoke(ApplicationListenerMethodAdapter.java:261) ~[spring-context-5.1.7.RELEASE .jar:5.1.7.RELEASE] ... 26 more Caused by: org.jdom2.input.JDOMParseException: Error on line 11: Attribute name "defer" associated with an element type "script" must be followed by the ' = ' c haracter. at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:232) ~[jdom2-2.0.6.jar:?] at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:303) ~[jdom2-2.0.6.jar:?] at org.jdom2.input.SAXBuilder.build(SAXBuilder.java:1196) ~[jdom2-2.0.6. jar:2.0.6] at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:233) ~[r ome-2.1.0.jar:2.1.0] at com.rometools.rome.io.SyndFeedInput.build(SyndFeedInput.java:150) ~[r ome-2.1.0.jar:2.1.0] at edu.rutgers.enterprise.portal.webscraperrss.service.RssScraperService .saveLastModifiedDate(RssScraperService.java:79) ~[classes/:?] at edu.rutgers.enterprise.portal.webscraperrss.WebScraperRssFeedGenerato rApplication.saveLastModifiedDate(WebScraperRssFeedGeneratorApplication.java:51) ~[classes/:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0 _392] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:62) ~[?:1.8.0_392] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) ~[?:1.8.0_392] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_392] at org.springframework.context.event.ApplicationListenerMethodAdapter.do Invoke(ApplicationListenerMethodAdapter.java:261) ~[spring-context-5.1.7.RELEASE .jar:5.1.7.RELEASE] ... 26 more Caused by: org.xml.sax.SAXParseException: Attribute name "defer" associated with an element type "script" must be followed by the ' = ' character. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAX ParseException(ErrorHandlerWrapper.java:204) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalErro r(ErrorHandlerWrapper.java:178) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError( XMLErrorReporter.java:399) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError( XMLErrorReporter.java:326) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(X MLScanner.java:1466) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scan Attribute(XMLNSDocumentScannerImpl.java:413) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scan StartElement(XMLNSDocumentScannerImpl.java:250) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImp l$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2783) ~[?:1.8.0 _392] at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(X MLDocumentScannerImpl.java:601) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next (XMLNSDocumentScannerImpl.java:112) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImp l.scanDocument(XMLDocumentFragmentScannerImpl.java:504) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(X ML11Configuration.java:841) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(X ML11Configuration.java:770) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser. java:141) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Ab stractSAXParser.java:1213) ~[?:1.8.0_392] at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.p arse(SAXParserImpl.java:642) ~[?:1.8.0_392] at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:217) ~[jdom2-2.0.6.jar:?] at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:303) ~[jdom2-2.0.6.jar:?] at org.jdom2.input.SAXBuilder.build(SAXBuilder.java:1196) ~[jdom2-2.0.6. jar:2.0.6] at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:233) ~[r ome-2.1.0.jar:2.1.0] at com.rometools.rome.io.SyndFeedInput.build(SyndFeedInput.java:150) ~[r ome-2.1.0.jar:2.1.0]
This is essentially the code snippet I'm using:
URL feedSource = new URL(url);
SyndFeedInput input = new SyndFeedInput();
input.setAllowDoctypes(true);
input.setPreserveWireFeed(true);
SyndFeed feed = input.build(new XmlReader(feedSource)); <-- Exception
Regards,
Hi, @luith2000 Sorry, you didn't put 'url' value which have the 'defer' tag. It's necessary to create a new unit test. Regards, Antonio.
Hi Antonio,
Sample URLs would be the following:
https://www.newark.rutgers.edu/news/feed https://news.camden.rutgers.edu/feed/ https://www.newark.rutgers.edu/feed https://rutgersnewarkathletics.com/rss.aspx https://it.rutgers.edu/feed/
Regards,
Hi, @luith2000 The first 3 URLs are not RSS/Atom valid URLs (in my case, 1 and 3 are HTML page errors from site), then you can't parse it as a valid ROME synd feed. The last 2 URLs are RSS 2.0 valid URLs.
Edit: I was investigating about the 3 links and they are Drupal sites. You need to include rss.xml at the end of each link: https://www.newark.rutgers.edu/rss.xml https://camden.rutgers.edu/rss.xml
Regards, Antonio.
Hi Antonio,
Thank you again for checking. Was just curious if there is a common practice in ROME to check if an URL is a valid RSS endpoint?
Thanks.
@luith2000 I think it is not the main purpose of ROME Library. Mainly you can create RSS/Atom feeds and also parse and valid RSS/Atom feeds from a valid source origin. Meanwhile, caughting a IO exception you can rarely valid a RSS/Atom feed, because of in normal situations you previously know the feed. Regards, Antonio.
Caused by: com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 11: Attribute name "defer" associated with an element type "script" must be followed by the ' = ' character. at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:236) ~[rome-1.12.0.jar:1.12.0]