Refactoring of `pdf_extract.py` script

Description: This PR refactors the pdf_extract.py script to improve readability and maintainability of the code. In order not to affect the current code, the app.py script and the app_tools library have been created. app.py performs the same process as pdf_extract.py. The app_tools library incorporates the refactorings of the different steps.

If you find it interesting you can replace app.py with pdf_extract.py

Motivation: I love the project, I would like to thank you for the great work done. Refactoring is done to continue working to create an api with fastAPI and Docker.

Main changes:

The script app.py has been created with the pipeline of pdf_extract.py.
The library app_tools has been created that contains the classes and methods to perform each step of the pipeline.
pdf.py: Provides a set of app_tools for working with PDF files.
layout_analysis.py: Analyzes the layout of documents by detecting the layout of each page in a document image.
formula_analysis.py: Is designed to handle formula detection and recognition in images.
ocr_analysis.py: OCR Processor. It is responsible for performing OCR recognition.
table_analysis.py: Represents a Table Processor that is used for table recognition in documents.
visualize.py: It generates visualizations of the document layout
config.py: Configure model parameters and logs
utils.py: save results in json

Functionality impact: No change to existing functionality is expected, as the refactoring does not introduce new features or modify existing ones.

Instructions for Reviewers:

Review the app.py and app_tools scripts to ensure that the logic has been ported correctly.
Verifies that there are no observable changes in the system's behavior when running the tests.

Example of Use:

python app.py --pdf 1706.03762.pdf

opendatalab / PDF-Extract-Kit

Refactoring of `pdf_extract.py` script #114