huridocs / pdf-document-layout-analysis

A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The service allows for the segmentation and classification of different parts of PDF pages, identifying the elements such as texts, titles, pictures, tables and so on.
Apache License 2.0
115 stars 12 forks source link

Update model links in "download_models.py" #40

Closed Mengqi925 closed 3 weeks ago

Mengqi925 commented 2 months ago

I found that the download link of "_vgt_model" in "download_model.py" (https://github.com/AlibabaResearch/AdvancedLiterateMachinery/releases/download/v1.3.0-VGT-release) cannot open. I think this one should be right: https://github.com/AlibabaResearch/AdvancedLiterateMachinery/releases/tag/v1.3.0-VGT-release Please check and update if necessary.

In addtion, I noticed that "microsoft/layoutlmv-base-uncased" has been improved to a new version→https://huggingface.co/microsoft/layoutlmv2-base-uncased

Thank you!

gabriel-piles commented 3 weeks ago

Thank you for contacting us. We understand that some users are experiencing difficulties downloading our models. I've tested the download process myself today, and it appears to be working correctly on my end. Please try downloading the model again. If you continue to encounter issues, please let us know, and we'll be happy to provide updated download links.

Regarding LayoutLM, retraining the model is not a high priority for us at this time. As we are not the original authors of the VGT model, retraining would require significant effort and time.