Open mpsampat opened 10 months ago
This issue also exists for other pages such as creating ground truth files page:
https://microsoft.github.io/presidio/image-redactor/evaluating_dicom_redaction/#creating-ground-truth-files;
the following lines of code generate the error shown below
# Format results for more direct comparison ocr_results_formatted = dicom_engine.bbox_processor.get_bboxes_from_ocr_results(ocr_results) analyzer_results_formatted = dicom_engine.bbox_processor.get_bboxes_from_analyzer_results(analyzer_results)
error observed:
TypeError Traceback (most recent call last) Cell In[19], line 1 ----> 1 ocr_results_formatted = dicom_engine.bbox_processor.get_bboxes_from_ocr_results(ocr_results) 2 analyzer_results_formatted = dicom_engine.bbox_processor.get_bboxes_from_analyzer_results(analyzer_results)
File /opt/conda/lib/python3.10/site-packages/presidio_image_redactor/bbox.py:18, in BboxProcessor.get_bboxes_from_ocr_results(ocr_results) 12 """Get bounding boxes on padded image for all detected words from ocr_results. 13 14 :param ocr_results: Raw results from OCR. 15 :return: Bounding box information per word. 16 """ 17 bboxes = [] ---> 18 print(ocr_results["text"]) 19 for i in range(len(ocr_results["text"])): 20 detected_text = ocr_results["text"][i]
Thank you @mpsampat, and apologies for the delayed response. We'll look into this.
The method verify_dicom_instance
already returns formatted ocr_results
. The variable is called ocr_bboxes
in the codebase.
Describe the bug The notebook https://github.com/microsoft/presidio/blob/main/docs/samples/python/example_dicom_redactor_evaluation.ipynb generates an error and does not provide the evaluation results. the error is shown belown.
To Reproduce Steps to reproduce the behavior:
_, eval_results = dicom_engine.eval_dicom_instance(instance, gt_file_of_interest)
File /opt/conda/lib/python3.10/site-packages/presidio_image_redactor/dicom_image_pii_verify_engine.py:175, in DicomImagePiiVerifyEngine.eval_dicom_instance(self, instance, ground_truth, padding_width, tolerance, display_image, use_metadata, ocr_kwargs, ad_hoc_recognizers, text_analyzer_kwargs) 165 # Verify detected PHI 166 verify_image, ocr_results, analyzer_results = self.verify_dicom_instance( 167 instance, 168 padding_width, (...) 173 text_analyzer_kwargs, 174 ) --> 175 formatted_ocr_results = self.bbox_processor.get_bboxes_from_ocr_results( 176 ocr_results 177 ) 178 detected_phi = self.bbox_processor.get_bboxes_from_analyzer_results( 179 analyzer_results 180 ) 182 # Remove duplicate entities in results
File /opt/conda/lib/python3.10/site-packages/presidio_image_redactor/bbox.py:18, in BboxProcessor.get_bboxes_from_ocr_results(ocr_results) 12 """Get bounding boxes on padded image for all detected words from ocr_results. 13 14 :param ocr_results: Raw results from OCR. 15 :return: Bounding box information per word. 16 """ 17 bboxes = [] ---> 18 print(ocr_results["text"]) 19 for i in range(len(ocr_results["text"])): 20 detected_text = ocr_results["text"][i]
TypeError: list indices must be integers or slices, not str` Expected behavior
Additional context could you please help provide a workaround for this issue. should i use an older tag of presidio ?