issues
search
yobix-ai
/
extractous
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Apache License 2.0
411
stars
17
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
markdown support
#37
peterlyz
opened
4 hours ago
0
Fix anchor links in README.md
#36
yutannihilation
opened
1 day ago
0
Support for Extracting PDF Content as XML
#35
coroluca
opened
2 days ago
1
use it in multiple processes.
#34
ljhssga
opened
2 days ago
1
Failed Extraction - cmap font missing
#33
s4zuk3
opened
5 days ago
2
Failed extraction - Class CTTextCharacterProperties is missing.
#32
s4zuk3
opened
5 days ago
2
Installation not working - WIndows 11/Python3.10
#31
IneffableBunch
opened
6 days ago
1
Change in PDF Extraction Results
#30
TheTechromancer
closed
6 days ago
3
change extractor api to return tuple of result and metadata
#29
nmammeri
closed
1 week ago
0
fix: update tika_native dir in build folder
#28
KapiWow
closed
1 day ago
0
25 make reflection data platform specific
#27
nmammeri
closed
1 week ago
0
Feature/3 return tika metadata
#26
s4zuk3
closed
1 week ago
3
make reflection data platform specific
#25
nmammeri
closed
1 week ago
0
Tika Metadata - HashMap Issue
#24
s4zuk3
closed
1 week ago
6
Stall when extracting using ocr on macos from pdf with embedded images
#23
nmammeri
opened
2 weeks ago
1
7 implement extracting from an array of bytes
#22
nmammeri
closed
1 week ago
0
Draft: Implement extracting from an array of bytes
#21
KapiWow
closed
2 weeks ago
1
Test Multiple Python Versions (+3.13 Support)
#20
TheTechromancer
opened
2 weeks ago
7
18 ocr examples and docs
#19
nmammeri
closed
3 weeks ago
0
ocr examples and docs
#18
nmammeri
closed
2 weeks ago
0
fix: fixed issue 16 and added test case
#17
nmammeri
closed
3 weeks ago
0
TypeError: ParseError("Parse error occurred : TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@281b1a01")
#16
NourEldin-Osama
closed
3 weeks ago
0
1 add microsoft windows support
#15
nmammeri
closed
3 weeks ago
0
Draft: windows support
#14
KapiWow
closed
3 weeks ago
0
failed to install in windows 11
#13
NourEldin-Osama
closed
3 weeks ago
9
build: don't rebuild graalvm libs if were built before
#12
nmammeri
closed
1 month ago
0
ISSUE#3: Implemented Tika Metadata
#11
s4zuk3
closed
2 weeks ago
5
PyPI package is huge
#10
chrisgoddard
closed
1 month ago
2
make the build script faster
#9
nmammeri
closed
1 month ago
0
tests: Tests with different file formats
#8
KapiWow
closed
1 month ago
1
Implement extracting from an array of bytes
#7
nmammeri
closed
1 week ago
0
Extracting text from a specific page of the document
#6
bm777
closed
1 month ago
4
Improve extract to stream performance
#5
nmammeri
opened
2 months ago
0
Add detect file type API
#4
nmammeri
opened
2 months ago
0
Return Metadata with extraction result
#3
nmammeri
closed
1 week ago
0
Add tests with different file formats
#2
nmammeri
closed
1 month ago
0
Add Microsoft Windows support
#1
nmammeri
closed
3 weeks ago
0