tsegall / fta

Metadata/data identification Java library. Identifies Semantic Type information (e.g. Gender, Age, Color, Country,...). Extensive country/language support. Extensible via user-defined plugins. Comprehensive Profiling support.
Apache License 2.0
24 stars 2 forks source link

OutOfMemoryError: Requested array size exceeds VM limit exception #70

Closed andreainfogix closed 6 months ago

andreainfogix commented 6 months ago

With the version upgrade from 8.0.22 to 14.6.1 there is an issue with outOfMemory exception. The guess is it doesn't like only empty strings if not using default semantic types. Below code reproduces it

public static void main(String[] args) throws FTAPluginException, FTAUnsupportedLocaleException {

    TextAnalyzer analyzer = new TextAnalyzer("foo", DateResolutionMode.Auto);

    analyzer.configure(TextAnalyzer.Feature.DEFAULT_SEMANTIC_TYPES, false);
    analyzer.train("");
    analyzer.train("");
    System.out.println("Training complete");

    TextAnalysisResult r = analyzer.getResult();
    System.out.println("  ftaType " +  r.toString());
 }
tsegall commented 6 months ago

Addressed in 15.5.0.