Closed LaurentRisser closed 3 years ago
Hello @walkyrie67 !
Thank you for the error case.
However this issue does not concern this client, you actually not using it in your description. Could you open the same issue in the Grobid repo (https://github.com/kermitt2/grobid) and close it here? Thanks in advance !
Thanks I have done it https://github.com/kermitt2/grobid/issues/815
Hello there,
As I was able to run Grobid locally, I sent a request and here is the output I am getting and I am also attaching the source file. `import requests from bs4 import BeautifulSoup as bs GROBID_URL = 'http://localhost:8080'
url = f'{GROBID_URL}/api/processFulltextDocument' pdf = 'pdf_test.pdf'
xml = requests.post(url, files={'input': open(pdf, 'rb')}).text bs_content = bs(xml, 'lxml') print(bs_content)`
Console output
<figure xmlns="1.0 Namespace " type="table" xml:id="tab_0"><head></head><label></label><figDesc>This is the text from the first test section that appears before the first table</figDesc><table><row><cell>Cell 0,0</cell><cell>Cell 0,1</cell><cell>Cell 0,2</cell><cell>Cell 0,3</cell><cell>Cell 0,4</cell><cell>Cell 0,5</cell></row><row><cell>Cell 1,0</cell><cell>Cell 1,1</cell><cell></cell><cell>Cell 1,3</cell><cell></cell><cell></cell></row><row><cell>Cell 2,0</cell><cell></cell><cell>Cell 2,2</cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="3">The above test table is 3 row by 6 column table.</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
It should be 2 tables and not one and also Grobid considers the text 'The above test table is 3 row by 6 column table.' as part of the table. Attached the source file for your reference pdf_test.pdf
Is there a way for Grobid to recognize the two tables?