Closed kartikbhtt7 closed 4 months ago
[!WARNING]
Review failed
The pull request is closed.
This update modifies the OCR system, mainly focusing on distance calculations and the functionality related to keyword detection. Key changes include adjusting the distance_cutoff
value for multiple components, improving keyword detection to handle multi-word terms and return detailed results, and conditional data extraction in OCR operations.
File | Change Summary |
---|---|
OCR/florence/api.py | Changed distance_cutoff default from 1 to 0 in embed method |
OCR/florence/model.py | Enhanced check_keywords for multi-word keywords, changed return to dictionary, and modified keyword threshold handling in run_ocr |
OCR/florence/request.py | Changed distance_cutoff default from 1 to 0 in ModelRequest class initializer |
Below is a sequence diagram illustrating the change in the control flow for the run_ocr
method in the Model
class:
sequenceDiagram
participant Client
participant OCR Model
participant Florence2 Model
participant Data Processor
Client->>OCR Model: Call run_ocr(image_path, florence2_model, keywords, lev_distance_threshold)
OCR Model->>Florence2 Model: Process image
Florence2 Model-->>OCR Model: Extracted text
OCR Model->>Data Processor: Check keywords with new threshold logic
Data Processor-->>OCR Model: Keyword results
OCR Model-->>Client: Returns extracted data if flag is set
🐇 In fields of code where pixels lie,
Distance metrics no longer high,
Keywords found in multi-word song,
Florence’s vision sharp and strong.
OCR dances, values float,
In zero’s realm, new paths we wrote. 🌾
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
Summary by CodeRabbit
New Features
Improvements
distance_cutoff
value from 1 to 0, potentially improving distance-related calculations in OCR processes.