Open ambarishK opened 5 years ago
It fails at:
/** convenience method to extract list of HtmlP in element
*
* @param htmlElement
* @return
*/
public static List<HtmlP> extractSelfAndDescendantIs(HtmlElement
htmlElement) {
return HtmlP.extractPs(HtmlUtil.getQueryHtmlElements(htmlElement,
ALL_P_XPATH));
}
called from
DefaultArgProcessor: public List<? extends Element>
extractPSectionElements(CTree cTree) {
List<? extends Element> elements = null;
if (cTree != null) {
cTree.ensureScholarlyHtmlElement();
elements =
HtmlP.extractSelfAndDescendantIs(cTree.getHtmlElement());
}
return elements;
}
My guess is it is a huge document and we need to extract the count and then limit the number of extracted elements. Will look in the morning.
On Tue, Feb 26, 2019 at 11:14 AM Ambarish Kumar notifications@github.com wrote:
Run command -
ami-search-new -p /media/ambarish123/AMBARISH/Ocimumproject18feb --dictionary /media/ambarish123/AMBARISH/dictionary/monoterpenes.xml /media/ambarish123/AMBARISH/dictionary/diterpene.xml /media/ambarish123/AMBARISH/dictionary/triterpene.xml /media/ambarish123/AMBARISH/dictionary/insecticides.xml /media/ambarish123/AMBARISH/dictionary/invasivespecies.xml /media/ambarish123/AMBARISH/dictionary/Ocimumspecies.xml /media/ambarish123/AMBARISH/dictionary/phytochemicals.xml country gene drugs plantparts
Error snippet
specific and generic values.
Generic values (AMISearchTool)
basename null cproject /media/ambarish123/AMBARISH/Ocimumproject18feb ctree cTreeList 2117 trees [/media/ambarish123/AMBARISH/Ocimumproject18feb/30 dryrun false excludeBase null excludeTrees null file types [] forceMake false includeBase null includeTrees null log4j logfile null verbose 0
Specific values (AMISearchTool)
dictionaryList [/media/ambarish123/AMBARISH/dictionary/monoterpenes.xml, /media/ambarish123/AMBARISH/dictionary/diterpene.xml, /media/ambarish123/AMBARISH/dictionary/triterpene.xml, /media/ambarish123/AMBARISH/dictionary/insecticides.xml, /media/ambarish123/AMBARISH/dictionary/invasivespecies.xml, /media/ambarish123/AMBARISH/dictionary/Ocimumspecies.xml, /media/ambarish123/AMBARISH/dictionary/phytochemicals.xml, country, gene, drugs, plantparts] dictionaryTop null dictionarySuffix [xml] ignorePlugins []
Run time error related JAVA heap memory.
................................................................................Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:72) at java.lang.StringCoding$StringEncoder.
(StringCoding.java:282) at java.lang.StringCoding$StringEncoder. (StringCoding.java:273) at java.lang.StringCoding.encode(StringCoding.java:338) at java.lang.String.getBytes(String.java:918) at nu.xom.Text.build(Unknown Source) at nu.xom.NonVerifyingHandler.flushText(Unknown Source) at nu.xom.NonVerifyingHandler.startElement(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source) at org.apache.xerces.impl.XMLNamespaceBinder.handleStartElement(Unknown Source) at org.apache.xerces.impl.XMLNamespaceBinder.startElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source) at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at nu.xom.Builder.build(Unknown Source) at nu.xom.Builder.build(Unknown Source) at nu.xom.Builder.build(Unknown Source) at org.contentmine.eucl.xml.XMLUtil.parseQuietlyToDocument(XMLUtil.java:1197) at org.contentmine.cproject.args.DefaultArgProcessor.getScholarlyHtmlElement(DefaultArgProcessor.java:1381) at org.contentmine.cproject.files.CTree.ensureScholarlyHtmlElement(CTree.java:1239) at org.contentmine.cproject.args.DefaultArgProcessor.extractPSectionElements(DefaultArgProcessor.java:1365) at org.contentmine.ami.plugins.AMIArgProcessor.ensureSectionElements(AMIArgProcessor.java:257) at org.contentmine.ami.plugins.AMIArgProcessor.runRunMethodsOnChosenArgOptions(AMIArgProcessor.java:228) at org.contentmine.cproject.args.DefaultArgProcessor.runAndOutput(DefaultArgProcessor.java:1296) at org.contentmine.ami.plugins.word.WordPluginOption.run(WordPluginOption.java:36) at org.contentmine.ami.plugins.CommandProcessor.runLegacyPluginOptions(CommandProcessor.java:301) at org.contentmine.ami.tools.AMISearchTool.runLegacyCommandProcessor(AMISearchTool.java:128) at org.contentmine.ami.tools.AMISearchTool.runSearch(AMISearchTool.java:112) issues -
I ran the project into the hard drive. There would not have been lack of memory space but it raises error. Please resolve.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/petermr/tigr2ess/issues/83, or mute the thread https://github.com/notifications/unsubscribe-auth/AAsxS6ExQqx8-2a2DxHKK222-HKbcv6Mks5vRRccgaJpZM4bR7zz .
-- Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069
I can reproduce the problem:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.lang.StringCoding.safeTrim(StringCoding.java:79)
at java.lang.StringCoding.access$300(StringCoding.java:50)
at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:305)
at java.lang.StringCoding.encode(StringCoding.java:344)
at java.lang.String.getBytes(String.java:918)
at nu.xom.Text.build(Unknown Source)
at nu.xom.NonVerifyingHandler.flushText(Unknown Source)
at nu.xom.NonVerifyingHandler.startElement(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
at org.apache.xerces.impl.XMLNamespaceBinder.handleStartElement(Unknown Source)
at org.apache.xerces.impl.XMLNamespaceBinder.startElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at nu.xom.Builder.build(Unknown Source)
at nu.xom.Builder.build(Unknown Source)
at org.contentmine.eucl.xml.XMLUtil.parseXML(XMLUtil.java:392)
at org.contentmine.graphics.html.HtmlFactory.parseToXHTML(HtmlFactory.java:702)
at org.contentmine.graphics.html.HtmlFactory.parse(HtmlFactory.java:647)
at org.contentmine.graphics.html.HtmlFactory.parse(HtmlFactory.java:622)
at org.contentmine.cproject.args.DefaultArgProcessor.getScholarlyHtmlElement(DefaultArgProcessor.java:1384)
at org.contentmine.cproject.files.CTree.ensureScholarlyHtmlElement(CTree.java:1239)
at org.contentmine.cproject.args.DefaultArgProcessor.extractPSectionElements(DefaultArgProcessor.java:1367)
at org.contentmine.ami.plugins.AMIArgProcessor.ensureSectionElements(AMIArgProcessor.java:257)
at org.contentmine.ami.plugins.AMIArgProcessor.runRunMethodsOnChosenArgOptions(AMIArgProcessor.java:228)
at org.contentmine.cproject.args.DefaultArgProcessor.runAndOutput(DefaultArgProcessor.java:1298)
at org.contentmine.ami.plugins.word.WordPluginOption.run(WordPluginOption.java:39)
This needs a rewrite of the search options.
Run command -
Error snippet
specific and generic values.
Run time error related JAVA heap memory.
issues -
I ran the project into the hard drive. There would not have been lack of memory space but it raises error. Please resolve.