tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
620 stars 181 forks source link

Same duration for training even if CORES count is changed #201

Closed prasad01dalavi closed 3 years ago

prasad01dalavi commented 3 years ago

Default CORES are 4 I have set to time make training CORES=10 MODEL_NAME=eng_foo1_test and tried to get time and also for time make training CORES=2 MODEL_NAME=eng_foo1_test both are same

Also tried with following OMP_THREAD_LIMIT=1 time make training CORES=2 MODEL_NAME=eng_foo1_test

Still, the there is no improvement in timing.

My System has 12 CORES

Shreeshrii commented 3 years ago

See https://github.com/tesseract-ocr/tesstrain/blob/master/Makefile#L40

# No of cores to use for compiling leptonica/tesseract. Default: $(CORES)
CORES = 4

It is used only while building tesseract and leptonica.

prasad01dalavi commented 3 years ago

Yes, I know that. I am assuming that it is for parallely process the training samples. Does that mean, while training at every epoch, these core will not be used parallely? Where and what exactly compiling leptonica/tesseract means?

Also, will more cores of CPU machine will increaes the training speed? what we need to do to increase the training speed?

wrznr commented 3 years ago

Unless you disable parallelization explicitly via OMP_THREAD_LIMIT=1, Tesseract will make use of all existing cores automatically (Open MP is used). So there isn't much you can do to increase training speed. However, you can always use make's -j parameter to parallelize the preparation of the box and lstmf files.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.