Max suggested that gathering some stats on the image files in our QA data set will be a useful way of better curating our data set and understanding the QA pipeline bottlenecks.
The following metrics should be gathered over the given book directory via a new QA command data_stats. This can be implemented in the base class QA_Module in qa_utilities.py:
Max suggested that gathering some stats on the image files in our QA data set will be a useful way of better curating our data set and understanding the QA pipeline bottlenecks.
The following metrics should be gathered over the given book directory via a new QA command
data_stats
. This can be implemented in the base classQA_Module
inqa_utilities.py
: