myint / language-check

Python wrapper for LanguageTool grammar checker
https://pypi.python.org/pypi/language-check
GNU Lesser General Public License v3.0
327 stars 101 forks source link

Running multiple instances appears to cause JRE to run out of memory. #66

Open ThomasWarn opened 5 years ago

ThomasWarn commented 5 years ago

I've tried making the tool global, within each instance, and everything in between. The logs created after crashing are similar to as follows. There is insufficient memory for the Java Runtime Environment to continue. Native memory allocation (malloc) failed to allocate 32744 bytes for ChunkPool::allocate Possible reasons: The system is out of physical RAM or swap space The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap Possible solutions: Reduce memory load on the system Increase physical memory or swap space Check if swap backing store is full Decrease Java heap size (-Xmx/-Xms) Decrease number of Java threads Decrease Java thread stack sizes (-Xss) Set larger code cache with -XX:ReservedCodeCacheSize= JVM is running with Zero Based Compressed Oops mode in which the Java heap is placed in the first 32GB address space. The Java Heap base address is the maximum limit for the native heap growth. Please use -XX:HeapBaseMinAddress to set the Java Heap base and to place the Java Heap above 32GB virtual address. This output file may be truncated or incomplete.

Out of Memory Error (allocation.cpp:273), pid=38452, tid=0x0000000000003880

JRE version: Java(TM) SE Runtime Environment (8.0_201-b09) (build 1.8.0_201-b09) Java VM: Java HotSpot(TM) 64-Bit Server VM (25.201-b09 mixed mode windows-amd64 compressed oops) Failed to write core dump. Minidumps are not enabled by default on client versions of Windows

--------------- T H R E A D ---------------

Current thread (0x0000000022b39000): JavaThread "C1 CompilerThread8" daemon [_thread_in_native, id=14464, stack(0x0000000024f00000,0x0000000025000000)]

Stack: [0x0000000024f00000,0x0000000025000000] [error occurred during error reporting (printing stack bounds), id 0xc0000005]

Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)

Current CompileTask: C1: 2818 17 3 java.io.WinNTFileSystem::isSlash (18 bytes)

and

JvmtiExport can_access_local_variables 0 JvmtiExport can_hotswap_or_post_breakpoint 0 JvmtiExport can_post_on_exceptions 0 # 101 ciObject found ciMethod java/lang/Object <init> ()V 4097 1 1539 0 -1 instanceKlass java/io/FileOutputStream$1 instanceKlass java/lang/StringCoding$StringEncoder instanceKlass java/lang/ThreadLocal$ThreadLocalMap instanceKlass java/lang/StringCoding instanceKlass sun/usagetracker/UsageTrackerClient$3 instanceKlass sun/usagetracker/UsageTrackerClient$2 instanceKlass sun/usagetracker/UsageTrackerClient$4 instanceKlass sun/usagetracker/UsageTrackerClient$1 instanceKlass java/util/concurrent/atomic/AtomicBoolean ...

The code I am running is as follows. It requires a folder named "Files" to be in the same directory as the script and for there to be a few text files within the "Files" folder.

import time
from os.path import isfile,join
import multiprocessing as mp
import language_check
directory = "Files"
totalwords = 0

def openfile(filename,output,tool):
    time.sleep(1)
    print("STARTED")
    file = open((filename),"r",encoding="utf-8")
    lenwords = 0
    errors = 0
    for line in file:
        if len(line)>10:

            words = line.split(" ")
            lenwords += len(words)
            #print(line)
            errors+=len(tool.check(str(line)))

    file.close()
    output.put([(errors/lenwords),filename])

def main():
    filenamelist = []
    for root,dirs,files in os.walk(directory):
        for file in files:
            if file.endswith(".txt"):
                #print(file)
                filenamelist.append(os.path.join(root,file))

    numfiles = len(filenamelist)
    starttime = time.time()
    output = mp.Queue()
    tool = language_check.LanguageTool('en-US')
    processes = [mp.Process(target=openfile,args=(filename,output,tool)) for filename in filenamelist]

    for p in processes:
        p.start()
    print("DONE COMPUTING")
    results = [output.get() for p in processes]
    print("Never reaches here")
    for p in processes:
        p.join()
    datafile = open("data.txt","a")
    for item in range(len(results)):
        datafile.write(str(round(item[0],4))+","+item[1]+"\n")
    datafile.close()

if __name__ =="__main__":
    main()

Any help would be appreciated. Thank you for your time.

jxmorris12 commented 4 years ago

Hi! Since this project has been abandoned, I started a new fork over at https://github.com/jxmorris12/language_tool_python.

My version supports new versions of Java and LanguageTool. language-check is stuck on Java 8 and LanguageTool 3.2; latest versions are Java 14 and LanguageTool 4.9!

I'm happy to help you with your issue if you raise it over at my repository! Thanks!