tesseract-ocr / test

Repository for tesseract testing
Apache License 2.0
30 stars 29 forks source link

Missing testdata files for unittests #13

Open Shreeshrii opened 5 years ago

Shreeshrii commented 5 years ago

testdata/lstm_training.txt is required for building training data for lstm_test

https://github.com/tesseract-ocr/tesseract/blob/master/unittest/lstm_test.cc#L6

// Generating the training data: // If the format of the lstmf (ImageData) file changes, the training data will // have to be regenerated as follows:

// ./tesseract/text2image --xsize=800 --font=Arial \ // --text=tesseract/testdata/lstm_training.txt --leading=32 \ // --outputbase=tesseract/testdata/lstm_training.arial // ./tesseract tesseract/testdata/lstm_training.arial.tif \ // tesseract/testdata/lstm_training.arial lstm.train \ // --pageseg_mode=6

Shreeshrii commented 5 years ago

0146_281.3B.tif line6.tiff 5318c4b679264.jpg

stweil commented 5 years ago

Cc'ing @jbreiden.

Shreeshrii commented 4 years ago

@stweil Do we still need more images/testdata from Google?

stweil commented 4 years ago

I'm afraid that we have to find our own solutions without waiting for Google. They cannot provide all images and test data because some might be copyrighted. Therefore it is important to find free replacement images and data. We have nearly all images needed for the unit tests (equationdetect_test still needs an image).

AndersonMartins1 commented 3 months ago

If you are looking for solutions to find free replacement images and data for use in unit testing, there are several options you can consider:

Free Image Banks: There are several free image banks available on the internet, where you can find high-quality, public domain images to use in your tests. Some examples include Unsplash, Pixabay and Pexels.

Test Data Databases: In addition to images, you may need test data for your test units. There are databases of test data freely available on the web that can be used to create realistic test scenarios. Search for open datasets related to your application domain.

Creating Images and Test Data: If you are unable to find suitable images or test data, consider creating your own. You can create simple images using free image editing tools like GIMP or Paint.NET, and generate test data using random data generation libraries in Python like Faker.

Community Resources: Don't underestimate the power of community. Search forums, discussion groups, and online communities related to your application domain. Many times, other developers are willing to share images and test data that they have created or found.

Creative Commons Licenses: When searching for free replacement images and data, be sure to check usage licenses. Many free resources are available under Creative Commons licenses, which may have specific attribution requirements or commercial use restrictions.

stweil commented 3 months ago

Thanks, but this issue is not about finding any image. It is about finding very specific images for a very specific task which is part of the unittests.

AndersonMartins1 commented 3 months ago

To resolve this issue, you can follow these steps:

Clearly identify which specific images are required for the test cases in question.

Make sure these images are available somewhere accessible for testing. This could be in an internal image repository, a cloud storage server, or another accessible location.

If images are not available, you may need to create or purchase the necessary images and ensure they are stored in a suitable location.

After ensuring that the required images are available, you can update your unit tests to reference these specific images when running your tests.

Be sure to clearly document the image requirements for each test case so future developers know which images are needed and where to find them.

Rerun your unit tests to ensure that the images are being used correctly and that the tests are passing as expected.

By following these steps, you should be able to solve the problem of finding the specific images needed for the test cases in your unit tests.

stweil commented 3 months ago

I am sorry to say that, but your comments (and your pull requests) are not helpful. They sound like the result of an AI chat bot. If you want to help, you should read this issue carefully (it lists the missing images), look into the test code where these images are used and try to activate that code with replacement images.

AndersonMartins1 commented 3 months ago

Ok