Plotting – again - Githubissues

bertsky commented 4 months ago

After several attempts by @Shreeshrii to share her excellent plotting scripts, each of which was unfortunately thwarted by bad circumstances (other big changes occurring at the same time), here comes a plotting facility again.

I based this on the ocrddata branch of her fork, cherry-picking only the two relevant changesets, resolving conflicts and then refactoring to make this better fit our makefileization.

Usage is simply make plot, which will only work after make training. (I could also make this dependency explicit, but that would cause make plot to start the training if it did not happen already for that combination of variables.)

The output files will be created in $OUTPUT_DIR/$MODEL_NAME.plot_log.png, e.g. herrnhut-kurrent tess finetuned-htrbin plot_log

and $OUTPUT_DIR/$MODEL_NAME.plot_cer.png, e.g. herrnhut-kurrent tess finetuned-htrbin plot_cer

All intermediate files (except for the lstmeval log files generated under $OUTPUT_DIR/eval/*.log because they are valuable in their own right) are marked as such and therefore removed by make.

Perhaps we should discuss how both plots could be combined into a single one (which is probably what @Shreeshrii tried to do already) – I can see that there's a problem by the granularity these data points are recorded (training iterations for validation during lstmtraining vs. learning iterations for validation afterwards via external lstmeval). But IIUC we have everything it takes to be able to combine them (twin y plot with synced x axes)...

bertsky commented 4 months ago

With the last commit I did re-instante @Shreeshrii's LOG_FILE variable.

The big pro is that thus you can opt in to plotting even older logs, e.g.

make plot LOG_FILE=nohup.out

zdenop commented 3 months ago

I just make quick test on openSUSE (15.5) and here are a few suggestions:

it would be nice to have a short example of how to make example plot on example data:

git clone https://github.com/tesseract-ocr/tesstrain
cd tesstrain
mkdir data
unzip ocrd-testset.zip -d data/ocrd-ground-truth
...
# install needed requirements
...
nohup make training MODEL_NAME=ocrd START_MODEL=frk TESSDATA=~/tessdata_best MAX_ITERATIONS=10000 > plot/TESSTRAIN.LOG &
make plot MODEL_NAME=ocrd

I removed python2 from openSUSE and I got this error:

python plot_cer.py data/ocrd ocrd data/ocrd/ocrd.iteration.tsv data/ocrd/ocrd.checkpoint.tsv data/ocrd/ocrd.eval.tsv data/ocrd/ocrd.sub.tsv data/ocrd/ocrd.lstmeval.tsv
/bin/bash: python: command not found

What about using PY_CMD (as rest of Makefile)?

When I run manually python3 plot_cer.py data/ocrd ocrd data/ocrd/ocrd.iteration.tsv data/ocrd/ocrd.checkpoint.tsv data/ocrd/ocrd.eval.tsv data/ocrd/ocrd.sub.tsv data/ocrd/ocrd.lstmeval.tsv I got error:
```
Traceback (most recent call last):
File "/home/podobny/Projekty/tesstrain/plot_cer.py", line 6, in <module>
import matplotlib
ModuleNotFoundError: No module named 'matplotlib'
```
It would be good to mentioned that user should install matplotlib and pandas (pip3 install matplotlib pandas) before running make plot.

stweil commented 3 months ago

It would be good to mention that user should install matplotlib and pandas (pip3 install matplotlib pandas) before running make plot.

Both are mentioned in requirements.txt, so running pip3 install -r requirements.txt is sufficient.