issues
search
nlmatics
/
nlm-ingestor
This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.
https://www.nlmatics.com
Apache License 2.0
1.11k
stars
160
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Issue with finding tables and sections
#45
Aviral-tech
opened
7 months ago
0
Error when parsing a PDF
#44
kaulsh
opened
7 months ago
2
Fix table left KeyError
#43
jinhy-sequoiacap
closed
7 months ago
1
Question: Is it possible to retrieve the pdf position (bbox) for table rows
#42
janwbouma
opened
8 months ago
0
box_style not being taken into account
#41
mikecook69
opened
8 months ago
2
made changes to integrate with indexer
#40
ansukla
closed
8 months ago
0
nlm-ingestor is SUPER SLOW
#39
pashpashpash
opened
8 months ago
4
Disable rules/paranthesized header
#38
mikecook69
opened
8 months ago
0
Suggestions for Fast Production Server
#37
yashpatel21
opened
8 months ago
5
[PDF Ingestor] make sure key idx within the range of sorted freq keys
#36
baobo5625
closed
8 months ago
0
Can you provide guidance on when page_idx wouldn't be available?
#35
chrismaresca
opened
8 months ago
0
Unable to finish setup of nlm-ingestor due to missing distutils module
#34
lukenas
opened
8 months ago
0
Encoding error with non-ASCII character.
#33
jamesvillarrubia
opened
8 months ago
2
PDF extraction
#32
Amy-raj
opened
8 months ago
1
Docker file available for hosting into lambda as container?
#31
akayalEC
opened
8 months ago
0
Not able to install nlm_ingestor
#30
sli701
opened
8 months ago
3
memory leaks
#29
ZengJin123
closed
6 months ago
1
.pages files are chunked correctly but page_idx is always 0
#28
pashpashpash
opened
9 months ago
0
.pptx files are correctly chunks, but page_idx is always 0
#27
pashpashpash
opened
9 months ago
1
.doc files are correctly chunked, but page_idx is always 0
#26
pashpashpash
opened
9 months ago
0
HTLM AND XML INGESTOR
#25
drewskidang
opened
9 months ago
1
KeyError: 'return_dict'
#24
ZengJin123
closed
9 months ago
4
Does the docker image come with the modified tika server?
#23
samgriek
opened
9 months ago
0
API url issues
#22
drewskidang
opened
9 months ago
4
TypeError: 'NoneType' object is not subscriptable
#21
opiethehokie
opened
9 months ago
1
ZeroDivisionError: float division by zero
#20
opiethehokie
opened
9 months ago
1
IndexError: list index out of range
#19
opiethehokie
opened
9 months ago
2
KeyError: 'left'
#18
opiethehokie
opened
9 months ago
4
Fails to deploy as a service on Google Cloud Run
#17
pashpashpash
closed
9 months ago
0
Added health check
#16
pashpashpash
closed
9 months ago
0
Local only use
#15
maximedb
opened
9 months ago
2
Deploy multi-platform docker image
#14
ianschmitz
closed
9 months ago
0
Missing arm64/v8 architecture
#13
ianschmitz
closed
9 months ago
1
How to use ingestors?
#12
frankiedrake
opened
9 months ago
2
How to handle PPT format?
#11
mengmeng0320
opened
9 months ago
1
Docker pull issue
#10
noviljohnson
closed
9 months ago
0
numpy error while parsing
#9
gabfeudo
opened
10 months ago
6
Fix readme typo
#8
erjanmx
closed
10 months ago
1
UnicodeEncodeError when trying to save as HTML
#7
RadekOnCrypto
opened
10 months ago
0
How to use HTML parser?
#6
ghost
closed
10 months ago
3
Connection being reset by peer
#5
jpbalarini
opened
10 months ago
3
Health endpoint
#4
jpbalarini
closed
9 months ago
2
Recommendation for production server
#3
jpbalarini
opened
10 months ago
7
Is it recommended to use the new indent parser?
#2
jpbalarini
opened
10 months ago
2
Query: How would it integrate with other LLM apis.
#1
sandeep2244
opened
10 months ago
1
Previous