mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
743 stars 130 forks source link

[Not an issue] How to get "regions" from recognition data with the python API #657

Open johnlockejrr opened 1 day ago

johnlockejrr commented 1 day ago

I was testing a streamlit simple inference for kraken, segmentation -> recognition, for fast visual checking some models. I get the segmentation "regions" from baseline_seg = blla.segment(image, model=seg_model) (the segmentation output). Any idea how can I or if I can access them from pred_it = rpred.rpred(network=rec_model, im=image, bounds=baseline_seg) (recognition output)? Thank you!

https://huggingface.co/spaces/johnlockejrr/kraken_ocr

Snip of my code:

    # Segment image using Kraken segmentation model
    baseline_seg = blla.segment(image, model=seg_model)

    # Pass segmentation result to recognition model
    pred_it = rpred.rpred(network=rec_model, im=image, bounds=baseline_seg)

    # Prepare to draw boundaries and display info
    boundaries_info = []
    draw = ImageDraw.Draw(image)

    # Process recognition predictions for lines and draw on image
    for idx, pred in enumerate(pred_it):
        prediction = pred.prediction
        line_boundary = [(int(x), int(y)) for x, y in pred.boundary]
        line_baseline = [(int(x), int(y)) for x, y in pred.baseline] if pred.baseline else None
        line_type = pred.tags.get("type", "undefined")  # Get line type dynamically if available

        # Add boundary, baseline (if selected), and prediction to display info in the new order
        boundaries_info.append(f"**Line {idx + 1}** (type: {line_type}):\n  - Boundary: {line_boundary}")

        # Draw boundary in green
        draw.polygon(line_boundary, outline="green")

        # Draw baseline if the option is selected and add it to display info
        if draw_baselines and line_baseline:
            boundaries_info.append(f"  - Baseline: {line_baseline}")
            draw.line(line_baseline, fill="red", width=2)  # Draw baseline in red

        # Add prediction last
        boundaries_info.append(f"  - Prediction: {prediction}")

    # Process and draw region boundaries from baseline_seg
    for region_type, region_list in baseline_seg.regions.items():
        for idx, region_data in enumerate(region_list):
            if hasattr(region_data, "boundary"):
                region_boundary = [(int(x), int(y)) for x, y in region_data.boundary]
                region_type_name = region_data.tags.get("type", region_type)  # Get region type dynamically
                boundaries_info.append(f"**Region {idx + 1}** (type: {region_type_name}):\n  - Boundary: {region_boundary}")
                draw.polygon(region_boundary, outline="blue")  # Draw region boundary in blue
mittagessen commented 1 day ago

On 24/11/06 12:37AM, johnlockejrr wrote:

I was testing a streamlit simple inference for kraken, segmentation -> recognition, for fast visual checking some models. I get the segmentation "regions" from baseline_seg = blla.segment(image, model=seg_model) (the segmentation output). Any idea how can I or if I can access them from pred_it = rpred.rpred(network=rec_model, im=image, bounds=baseline_seg) (recognition output)? Thank you!

There's no integrated way, you need to keep track of segmentation objects and the lines yourself. The regions have stable identifers through the pipeline so you can flatten the region dict and then just do the lookup from the ID(s) contained in the ocr_record objects returned by rpred.rpred().

johnlockejrr commented 1 day ago

I think I will stay on getting them from the segmentation, I think is safe.