Open sandeepsamant1702 opened 1 year ago
Hello @sandeepsamant1702 !
Which version of Grobid are you using?
In 0.7.3
, it is encoded like this in the result XML:
<figure
xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0">
<head>Table 1</head>
<label>1</label>
<figDesc>Summary statistics of welfare aggregates monthly).</figDesc>
<table>
<row>
<cell>Variable</cell>
<cell>Mean (USD)</cell>
<cell>Mean (JD)</cell>
<cell>Std. Dev. (JD)</cell>
<cell>Min (JD)</cell>
<cell>Max (JD)</cell>
</row>
<row>
<cell>Income per capita</cell>
<cell>49.63</cell>
<cell>34.95</cell>
<cell>64.41</cell>
<cell>0</cell>
<cell>3000</cell>
</row>
...
</table>
</figure>
The latest version only . I clone using git clone 'git clone https://github.com/kermitt2/grobid.git'. I still didn't understood you. The problem with me is that the pdf I am parsing contains tables. so whenever a table comes up grobid skips the entire page reading only the first line of the table
The latest version only
Sorry, the master version is currently work-in-progress with respect to table and figures, you should use the latest stable version 0.7.3
, for example the docker image.
Could you share maybe this PDF so that I could try to reproduce the error ?
I am getting issue with version 0.7.3 when doing ./gradlew run. It gives me error on Java "undefined symbol: __libc_pthread_init, version GLIBC_PRIVATE" . I am using open jdk 11? Does it require some other java version? I tried version 17 also for jdk..gives the same issue
Precisely the error is:
/usr/lib/jvm/java-11-openjdk-amd64/bin/java: symbol lookup error: grobid-0.7.3/grobid-home/lib/lin-64/libpthread.so.0: undefined symbol: __libc_pthread_init, version GLIBC_PRIVATE
I am getting issue with version 0.7.3 when doing ./gradlew run. It gives me error on Java "undefined symbol: __libc_pthread_init, version GLIBC_PRIVATE" . I am using open jdk 11? Does it require some other java version?
Ahh this error comes from your glib version, see https://github.com/kermitt2/grobid/issues/1019, the fix is to use the master version where I rebuilt the native lib to avoid this error :D
Anyway, I tried the PDF withh 0.7.3
and master, I have the same result:
<figure type="table">
) are located at the end of the text , before the notes (<note>
), this is a normalized form of the document (maybe it's why you have the impression that it "skips" the page?)<ref type="table" target="#tab_5">4</ref>
)<figure type="table">
are not good for this document unfortunately, but the content appears I thinkDoes it help ?
What is your OS and architecture? Windows is not supported and Mac OS arm64 is not yet supported. For non-supported OS, you can use Docker (https://grobid.readthedocs.io/en/latest/Grobid-docker/)
Linux machine AWS (sagemaker)
What is your Java version (
java --version
)?JDK 11
In case of build or run errors, please submit the error while running gradlew with
--stacktrace
and--info
for better log traces (e.g../gradlew run --stacktrace --info
) or attach the log filelogs/grobid-service.log
.