kermitt2 / grobid_client_python

Python client for GROBID Web services
Apache License 2.0
275 stars 74 forks source link

Adding text into a table #35

Closed LaurentRisser closed 3 years ago

LaurentRisser commented 3 years ago

Hello there,

As I was able to run Grobid locally, I sent a request and here is the output I am getting and I am also attaching the source file. `import requests from bs4 import BeautifulSoup as bs GROBID_URL = 'http://localhost:8080'

url = f'{GROBID_URL}/api/processFulltextDocument' pdf = 'pdf_test.pdf'

xml = requests.post(url, files={'input': open(pdf, 'rb')}).text bs_content = bs(xml, 'lxml') print(bs_content)`

Console output <figure xmlns="1.0 Namespace " type="table" xml:id="tab_0"><head></head><label></label><figDesc>This is the text from the first test section that appears before the first table</figDesc><table><row><cell>Cell 0,0</cell><cell>Cell 0,1</cell><cell>Cell 0,2</cell><cell>Cell 0,3</cell><cell>Cell 0,4</cell><cell>Cell 0,5</cell></row><row><cell>Cell 1,0</cell><cell>Cell 1,1</cell><cell></cell><cell>Cell 1,3</cell><cell></cell><cell></cell></row><row><cell>Cell 2,0</cell><cell></cell><cell>Cell 2,2</cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="3">The above test table is 3 row by 6 column table.</cell><cell></cell><cell></cell><cell></cell></row></table></figure>

It should be 2 tables and not one and also Grobid considers the text 'The above test table is 3 row by 6 column table.' as part of the table. Attached the source file for your reference pdf_test.pdf

Is there a way for Grobid to recognize the two tables?

Screenshot from 2021-08-12 18-50-44

kermitt2 commented 3 years ago

Hello @walkyrie67 !

Thank you for the error case.

However this issue does not concern this client, you actually not using it in your description. Could you open the same issue in the Grobid repo (https://github.com/kermitt2/grobid) and close it here? Thanks in advance !

LaurentRisser commented 3 years ago

Thanks I have done it https://github.com/kermitt2/grobid/issues/815