Open vinay-cldscle opened 2 weeks ago
Hi @vinay-cldscle, have you lookied into the BatchAnalyzerEngine
option?
Hi @omri374 Yes, i tried using the BatchAnalyzerEngine for txt files but it not working. analyzer_engine = AnalyzerEngine() analyzer = BatchAnalyzerEngine(analyzer_engine=analyzer_engine)
error: results = analyzer.analyze(texts=text_chunks, language="en", return_decision_process=True) ^^^^^^^^^^^^^^^^ AttributeError: 'BatchAnalyzerEngine' object has no attribute 'analyze'
Batch analyzer works only for list and dict?
Please see the python API reference here: https://microsoft.github.io/presidio/api/analyzer_python/#presidio_analyzer.BatchAnalyzerEngine.analyze_iterator
your text_chunks
should be iterable (such as List[str]
) and then you could call batch_analyzer.analyze_iter(text_cunks,...)
Hey team, When I tried to scan a file that is 7 MB and contains more than 700,000 lines, I passed the data in chunks(chunks size is 100000). It takes about 7 to 10 minutes to complete execution. Is this normal behavior? Can we reduce the execution time? Does batch analysis support TXT files? I would like to complete the execution within 1 minute. Is that possible?