Closed aaronk6 closed 5 years ago
Your GT set is not big enough. We are dividing the GT data into training (90 %) and evaluation data (10 %). Having just three lines of GT leaves the evaluation set empty.
Will be handled via https://github.com/OCR-D/ocrd-train/issues/42
Hi @wrznr, thanks for looking into this. I actually thought it would be smart to start with a small set to see if the process is working before feeding it with a bigger set, but apparently that wasn’t the case 🙂
can someone help with this error:
find data/ground-truth -name '.lstmf' | python3 shuffle.py 0 > "data/foo/all-lstmf" mkdir -p data/foo total=$(wc -l < data/foo/all-lstmf); \ train=$(echo "$total 0.90 / 1" | bc); \ test "$train" = "0" && \ echo "Error: missing ground truth for training" && exit 1; \ eval=$(echo "$total - $train" | bc); \ test "$eval" = "0" && \ echo "Error: missing ground truth for evaluation" && exit 1; \ head -n "$train" data/foo/all-lstmf > "data/foo/list.train"; \ tail -n "$eval" data/foo/all-lstmf > "data/foo/list.eval" Error: missing ground truth for training Makefile:106: recipe for target 'data/foo/list.train' failed make: *** [data/foo/list.train] Error 1
The make file misses files for training. Can you check the directory data/ground-truth
for files with the suffix .lstmf
?
@wrznr i have the same error with lokesh-stack
i used the image and ground-truth text from ocrd-testset.zip
and put them in data/foo-ground-truth
i used the command make training
after running that command the error Error: missing ground truth for training
is shown
@wrznr i have the same error with lokesh-stack
i used the image and ground-truth text from
ocrd-testset.zip
and put them indata/foo-ground-truth
i used the commandmake training
after running that command the errorError: missing ground truth for training
is shown
@wrznr @royudev Yes, even the sample dataset ocrd-testset.zip is failing and showing this error
/bin/bash: line 4: bc: command not found
+ head -n '' data/foo/all-lstmf
head: invalid number of lines: ''
+ tail -n '' data/foo/all-lstmf
tail: invalid number of lines: ''
Makefile:165: recipe for target 'data/foo/list.train' failed
make: *** [data/foo/list.train] Error 1
Please tell how I can fix this? Thanks
/bin/bash: line 4: bc: command not found
bc is needed for the line number calculations. Please install that.
Hi guys,
I’m pretty new to this, so please forgive if I’m missing something obvious.
I’ve posted to the mailing list because Tesseract sometimes confuses the digit 4 with a 9 in the material I’m currently processing. Someone over there pointed me to your project.
So what I would like to do now is to finetune the Latin script to fix the recognition errors I’m seeing. If I understand this correctly, I’ll need to go to
data/ground-truth
and create files there for Tesseract to learn from, e.g:_April2014.gt.txt
_April2014.tif
Also, I’ve cloned https://github.com/tesseract-ocr/tessdata.
What else do I need to do? The reason I’m asking is because I get an error when executing the following:
Looks like it wants a
data/list.eval
file which isn’t there. Is this why it’s crashing?I’m running this on Ubuntu 16.04.
Thank you!