Closed LivingDeadCloud closed 3 years ago
Ok, after some debugging I was able to run the code. If you want, I could post my code here, so that users that will have to use Google Colab have it ready.
Let me know!
Cheers
@LivingDeadCloud, thanks! If you don't mind I would appreciate it!
Sorry, I did not have time to help you in between... :/
@LivingDeadCloud, thanks! If you don't mind I would appreciate it!
Sorry, I did not have time to help you in between... :/
Yeah no problem, I will post it next week!
Hey everyone
Sorry for the delay. I'm going to post the code now. Just notice that I haven't used the code ever since my original post, so it may need some small adjustment. Here's my code to run TableNet in Google Colab:
# Mount drive
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
# Install all requirements
!pip install -r /content/drive/MyDrive/TableNet/requirements.txt # Change this with the path to requirements.txt
# Install additional packages
!pip install tesseract
!pip install torchtext==0.8.0
!pip install torch==1.7.1
!pip install pytorch-lightning==1.2.2
!pip install torchmetrics
!pip install deprecate
!apt install tesseract-ocr
!apt install libtesseract-dev
# Run the code
# python predict.py --model_weights='<weights path>' --image_path='<image path>' # Default command line
result = !python /content/drive/MyDrive/TableNet/predict.py --model_weights='/content/drive/MyDrive/TableNet/best_model.ckpt' --image_path='/content/drive/MyDrive/TableNet/TablesImages/Your_image.png' # Change paths to "predict.py" and "Your_image.png" according to your drive
# Look the result
result
Now, result
is a IPython.utils.text.SList
type variable, so here you may need some adjustments to predict.py function. However it should be pretty straightforward from here.
If someone is willing to post their code to get result as a more useful type of variable, for exampel a Pandas dataframe, that would be great!
Cheers
@LivingDeadCloud Thanks! I will add this to README.md
Hi there @tomassosorio
Quick introduction: I need to extract data from PDF/images containing tables. Unfortunately, I have several different formats and traditional tools (PDFPlumber, Tabula, Camelot) do not seem to work for every possible format. So now I'm trying a DL approach, and looking for some TableNet implementation code I found this repo.
I'm trying to use you code on Google Colab, but unfortunately I was not able to make it work. Notice that I have very little experience with DL libraries, so I apologise if my question is trivial.
Anyway. here's my code:
This is the error I get:
I have to admit I have no idea what is causing the error. Could you please help me?
Thanks a lot and great work!