stefanklut / laypa

Layout analysis to find layout elements in documents (similar to P2PaLA)
MIT License
17 stars 4 forks source link

Baseline model fine-tuning config #34

Closed fattynoparents closed 4 months ago

fattynoparents commented 5 months ago

Thought I'd better create a separate issue for this question. When finetuning a baseline model, Is there any possibility to configure the config file so that the model put a break to a line before a specified symbol? Thing is, the material we train the model on is very variable, and sometimes there should be a break in the line even though the space between two words is rather small.

A couple of examples to illustrate what I mean.

Here Laypa would separate the words correctly: image And here it would draw a continuous line, though there should be a break: image

Edit: Or is it so that the baseline model is complex enough that after it has met certain symbols (in this case 8:0) in enough of the training material and seen a break in the line in that place, it will learn to put a break next time it sees this symbol?

stefanklut commented 5 months ago

You pose a very good question.

Let me start of with saying that currently there is no method of manually adding such constraints to a model. If you have any suggestion of what this might look like I am curious to hear it. Adding it after a specific symbol would mean you also have the transcription and not just baselines?

The part of the edit is very specific, so I have hard time answering this question. It is one of the main problem we are currently facing (baseline incorrectly connecting or not connecting). The other problem being rotated text. It might be able to learn that there needs to be a gap, but this might be really specific to your data. If you are training on mostly tabular data it might be easier (not saying it is definitely possible). But if there is also normal text data mixed in I suspect it will be harder (also not saying impossible) due to most regular text lines being connected. As a start you could have a look using the eval.py script to see what the ground truth currently looks like when converted to pixels. This can maybe help give an idea of whether or not the baseline is even separated in the ground truth (this will of course determine if it is even possible to learn that there should be separation).

Let me know what you find :)

fattynoparents commented 5 months ago

Adding it after a specific symbol would mean you also have the transcription and not just baselines?

Yes, now I think about my initial question it seems to be rather stupid :) I should have known better now that I've been using the Loghi system for a few months, but I thought for some reason that the baseline model knows something about the contents. However, the transcription happens after the segmentation, so when the baseline model segments an image into parts it has no idea of what text there is on the page. I think therefore that the EDIT part of my question is irrelevant either.

We have a project to transcribe a huge amount of material from a library catalog. In most cases there will be an author name at the top, some additional info below, and then one or several titles below, plus some info in the margins, here is one of the simplest examples: image So the data is mostly tabular in terms of the page layout. The problem is that, as I wrote in the initial post, we need to distinguish certain data inside the title part, one example would be to put a break before 8:0, and as far as I get it, if 8:0 would always have appeared in a certain part of a page, it would have been possible to train the base model to understand the pattern, but since it can appear in various parts of a page, it looks like this is an impossible thing to do?

I will also check the eval.py script and see what I get with my data, thanks for the tips!

fattynoparents commented 5 months ago

By the way, I have noticed inconsistency in the tutorial about eval parameters: Untitled

And when I try to run it as follows:

docker run $DOCKERGPUPARAMS --rm -it -u $(id -u ${USER}):$(id -g ${USER}) 
-m 32000m --shm-size 10240m -v $LAYPADIR:$LAYPADIR 
-v $TRAINDIR:$TRAINDIR -v $WEIGHTSDIR:$WEIGHTSDIR $DOCKERLAYPA \
        python eval.py \
        -c $LAYPAMODEL \
        -i $TRAINDIR

I get the following error:

Traceback (most recent call last):
  File "/src/laypa/eval.py", line 298, in <module>
    main(args)
  File "/src/laypa/eval.py", line 104, in main
    image_paths = get_file_paths(args.input, supported_image_formats, cfg.PREPROCESS.DISABLE_CHECK)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/laypa/utils/input_utils.py", line 90, in get_file_paths
    raise TypeError("Cannot run when the input path is None")
TypeError: Cannot run when the input path is None
stefanklut commented 5 months ago

Oh you are right about the docs being outdated. However, the --input should be working. This indicates that args.input is not set, so maybe check if $TRAINDIR is correct?

fattynoparents commented 5 months ago

Yeah it's correct, the same directory works fine for the main.py script. The input should be a folder with images and the page folder with corresponding XML files, right?

stefanklut commented 5 months ago

Yes that is the correct type of input. I'll try and see if I can reproduce the error when running using the docker somewhere this afternoon. I have not checked if using the docker could break the eval code

fattynoparents commented 5 months ago

Ok I have investigated this further and it appears the error was due to comments in the script, when I removed them, the error is gone. However, I then got this:

Traceback (most recent call last):
  File "/src/laypa/eval.py", line 298, in <module>
    main(args)
  File "/src/laypa/eval.py", line 106, in main
    predictor = Predictor(cfg=cfg)
                ^^^^^^^^^^^^^^^^^^
  File "/src/laypa/run.py", line 91, in __init__
    raise FileNotFoundError("Cannot do inference without weights. Specify a checkpoint file to --opts TEST.WEIGHTS")
FileNotFoundError: Cannot do inference without weights. Specify a checkpoint file to --opts TEST.WEIGHTS

So it seems this parameter should also be mentioned as mandatory.

Finally, when I add the path to the weights I get this error:

Traceback (most recent call last):
  File "/src/laypa/eval.py", line 298, in <module>
    main(args)
  File "/src/laypa/eval.py", line 211, in main
    fig_manager.window.showMaximized()
    ^^^^^^^^^^^^^^^^^^
AttributeError: 'FigureManagerBase' object has no attribute 'window'
stefanklut commented 5 months ago

Is this issue with comment something we should be looking into, or just on your machine?

I think your final issue is due to running inside a docker, and visualization using matplotlib cannot open a window to display in. I've updated the README with instruction on how to use eval with the --save option. Could you please run:

docker run $DOCKERGPUPARAMS --rm -it -u $(id -u ${USER}):$(id -g ${USER}) 
-m 32000m --shm-size 10240m -v $LAYPADIR:$LAYPADIR 
-v $TRAINDIR:$TRAINDIR -v $WEIGHTSDIR:$WEIGHTSDIR -v $OUTPUTDIR:$OUTPUTDIR $DOCKERLAYPA \
        python eval.py \
        -c $LAYPAMODEL \
        -i $TRAINDIR
        -o $OUTPUTDIR
        --save gt

With specifying $OUTPUTDIR

This should output the gt visualization to the output directory.

fattynoparents commented 5 months ago

Is this issue with comment something we should be looking into, or just on your machine?

I'm not sure, I had the following code that didn't work:

docker run $DOCKERGPUPARAMS --rm -it -u $(id -u ${USER}):$(id -g ${USER}) 
-m 32000m --shm-size 10240m -v $LAYPADIR:$LAYPADIR 
-v $TRAINDIR:$TRAINDIR -v $WEIGHTSDIR:$WEIGHTSDIR $DOCKERLAYPA \
        python eval.py \
        -c $LAYPAMODEL \
        # first commented line
        # second commented line
        -i $TRAINDIR

When I removed the comments, the error was gone.

The code with the --save option worked also, thanks.

fattynoparents commented 5 months ago

This can maybe help give an idea of whether or not the baseline is even separated in the ground truth (this will of course determine if it is even possible to learn that there should be separation).

After running eval.py if I see baselines which I would like to train Laypa to draw, does it mean it goes to train the model to recognize them? F.ex. in the example below I talk about the lines below the { sign. image

stefanklut commented 5 months ago

eval.py is purely for checking how the GT or prediction looks when drawing regions or baselines. The actual training is done using main.py. However using eval.py you can have a look on whether or not the baseline are separated enough in this images for example. If they are not separate baselines in the GT, than the model realistically has no change of ever predicting them as separate. Depending on what the baselines look like you may want to change the thickness of the baselines or use the square baseline option image

fattynoparents commented 5 months ago

or use the square baseline option

Could you please tell me what this option is for and how it is configured? Thanks!

stefanklut commented 5 months ago

The setting is called PREPROCESS.BASELINE.SQUARE_LINES, and it cuts of the ends of baselines to make them end on a square line. Mostly useful if your line width is very high and therefore the lines are going past the actual baseline in the GT

image

fattynoparents commented 5 months ago

whether or not the baseline are separated enough

How do I know if they are separated enough? All gt images have baselines that are separated in the places they have to be separated, but sometimes the space between two lines is too small, and the prediction pictures of course have them as a continuous line, like here f.ex.: gt image pred image

fattynoparents commented 5 months ago

One more thing (sorry for so many questions :) - why would such an error arise when launching the Laypa training process? It only appears with PREPROCESS.BASELINE.SQUARE_LINES set to true.

Traceback (most recent call last):
  File "/src/laypa/main.py", line 140, in <module>
    main(args)
  File "/src/laypa/main.py", line 128, in main
    launch(
  File "/opt/conda/envs/laypa/lib/python3.12/site-packages/detectron2/engine/launch.py", line 84, in launch
    main_func(*args)
  File "/src/laypa/main.py", line 107, in setup_training
    preprocess_datasets(cfg, args.train, args.val, tmp_dir)
  File "/src/laypa/core/preprocess.py", line 99, in preprocess_datasets
    process.run()
  File "/src/laypa/datasets/preprocess.py", line 525, in run
    results = list(
              ^^^^^
  File "/opt/conda/envs/laypa/lib/python3.12/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/opt/conda/envs/laypa/lib/python3.12/multiprocessing/pool.py", line 873, in next
    raise value
ValueError: /home/user/training-laypa/baseline/2024.04.15/val_input/page/k21.xml has no contours

If I set it to false, the script runs fine, but I get a lot more warnings saying *.xml contains overlapping baseline sem_seg

fattynoparents commented 5 months ago

Sorry, one more question - is there or will there be a possibility to set an early stopping patience value in the config?

stefanklut commented 5 months ago

Separated enough is hard to tell (I'm still trying to figure that out), but it seems that using non square lines the lines would touch.

The error is due to the baselines being so small that there are no contours when drawing them in. I have attempted a fix for this problem in 2.0.2 (Should be released, or at least soon). As a "fix" I just draw in a circle when the baseline is too short, instead of the square sliver that would otherwise be drawn.

I don't think detectron2 has early stopping build in, and I have not added it in. However I would consider adding it. Especially in the form of a PR ;)

fattynoparents commented 4 months ago

Using version 2.0.2 I now get the following error, which was supposed to have been fixed:

Preprocessing:   0%|                                                                            | 0/171 [00:00<?, ?it/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/opt/conda/envs/laypa/lib/python3.12/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/src/laypa/datasets/preprocess.py", line 557, in process_single_file
    image_shape = self.augmentations[0].get_output_shape(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/laypa/datasets/augmentations.py", line 261, in get_output_shape
    raise ValueError("Edge length is not set")
ValueError: Edge length is not set
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/src/laypa/main.py", line 140, in <module>
    main(args)
  File "/src/laypa/main.py", line 128, in main
    launch(
  File "/opt/conda/envs/laypa/lib/python3.12/site-packages/detectron2/engine/launch.py", line 84, in launch
    main_func(*args)
  File "/src/laypa/main.py", line 107, in setup_training
    preprocess_datasets(cfg, args.train, args.val, tmp_dir)
  File "/src/laypa/core/preprocess.py", line 51, in preprocess_datasets
    process.run()
  File "/src/laypa/datasets/preprocess.py", line 629, in run
    results = list(
              ^^^^^
  File "/opt/conda/envs/laypa/lib/python3.12/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/opt/conda/envs/laypa/lib/python3.12/multiprocessing/pool.py", line 873, in next
    raise value
ValueError: Edge length is not set

Could you please have a look at this? Has this maybe been already fixed in newer versions?

stefanklut commented 4 months ago

Thank you for reporting, I'll have a look to see if I can reproduce this error

stefanklut commented 4 months ago

Can confirm that is still an issue, I will roll out a fix today/tomorrow

fattynoparents commented 4 months ago

Can confirm that is still an issue, I will roll out a fix today/tomorrow

Hi, any news on the fix? :)

stefanklut commented 4 months ago

Hi, a little more patience please. I broke something else, so training still doesn't work :sweat_smile: Updated Docker build is running. I'll let you know when it is up

fattynoparents commented 4 months ago

Haha ok, thanks for letting me know anyway :)

rvankoert commented 4 months ago

and 2.0.4 is released with Stefan's fix. Hope it works for you

fattynoparents commented 4 months ago

and 2.0.4 is released with Stefan's fix. Hope it works for you

Thanks for the update! The training process seems to be working fine now. A small question - why would the following warning arise at processing certain pictures?

WARNING [04/30 06:18:34 laypa.page_xml.xml_converter]: File /home/user/training-laypa/baseline/2024.04.15/train_input/page/2049.xml contains overlapping baseline pano

Is it safe to ignore it if it does not arise too often?

stefanklut commented 4 months ago

If they aren't super frequent they can be ignored. They are there to show if the GT has overlapping lines. As they are near impossible for the model to distinguish. But sometime that just is the correct ground truth. If it is happening for almost every file then you should have a look at the line thickness during preprocessing or the quality of the GT.