issues
search
vectara
/
vectara-ingest
An open source framework to crawl data sources and ingest into Vectara
https://vectara.com
Apache License 2.0
147
stars
50
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
initial
#126
ofermend
closed
2 days ago
0
Update FMP/Edgar crawler
#125
ofermend
closed
1 week ago
0
Updates for folder crawler
#124
ofermend
closed
1 week ago
0
Fixes to CSV crawler
#123
ofermend
closed
2 weeks ago
0
Add twitter crawler
#122
ofermend
closed
1 month ago
0
Fix issues with CSV crawler
#121
ofermend
closed
1 month ago
0
Updated log msg
#120
ofermend
closed
1 month ago
0
Store docs locally
#119
ofermend
closed
2 months ago
0
fixed issue with HTML inside XML
#118
ofermend
closed
2 months ago
0
Parallel processing for CSV Crawler
#117
ofermend
closed
2 months ago
0
Fix hfdataset ray
#116
ofermend
closed
2 months ago
0
Only normalize metadata values that are strings
#115
nespera
closed
2 months ago
0
Update website_crawler.py to keep clean urls
#114
nespera
closed
2 months ago
0
Updates to PMC crawler
#113
ofermend
closed
2 months ago
0
Notion crawler not finding any pages
#112
swarajban
closed
2 months ago
2
TomlDecodeError: Invalid date or number
#111
swarajban
closed
3 months ago
2
ModuleNotFoundError: no module named 'yaml' when running crawler
#110
swarajban
closed
3 months ago
1
bug fixes to GDrive crawler
#109
ofermend
closed
3 months ago
0
fixed OOM issue by resetting browser every 100 times
#108
ofermend
closed
3 months ago
0
updated notion crawler
#107
ofermend
closed
3 months ago
1
fixed bug in csv/DB crawlers
#106
ofermend
closed
3 months ago
0
Gdrive crawler 2
#105
AbhilashaLodha
closed
3 months ago
0
Hotfix
#104
ofermend
closed
4 months ago
0
Update crawler extraction
#103
ofermend
closed
4 months ago
0
HF crawler and other updates
#102
ofermend
closed
4 months ago
0
google drive crawler
#101
AbhilashaLodha
closed
4 months ago
0
Huggingface crawler
#100
ofermend
closed
4 months ago
0
FMP update
#99
ofermend
closed
4 months ago
0
Update SECURITY.md
#98
eskibars
closed
5 months ago
0
bugfix
#97
ofermend
closed
5 months ago
0
Update to FMP crawler
#96
ofermend
closed
5 months ago
0
Improve edgar crawler
#95
ofermend
closed
5 months ago
0
Youtube crawler
#94
ofermend
closed
5 months ago
0
Add XLSX to CSV crawler
#93
ofermend
closed
6 months ago
0
Reduce docker image size
#92
ofermend
closed
6 months ago
0
updated hackernews crawler with more metadata
#91
ofermend
closed
6 months ago
0
updates to make slack crawler better
#90
ofermend
closed
6 months ago
1
Minor fix to README
#89
ofermend
closed
6 months ago
0
Hotfix
#88
ofermend
closed
6 months ago
0
integrated ray with slack crawler
#87
adeelehsan
closed
6 months ago
0
Option to remove docs not crawled
#86
ofermend
closed
6 months ago
1
Update to crawler
#85
ofermend
closed
6 months ago
0
remove attrdict
#84
ofermend
closed
6 months ago
0
updated in playwright waituntil from networkidele to load
#83
ofermend
closed
7 months ago
0
Update to rate limiting functionality and a few more things
#82
ofermend
closed
7 months ago
0
Fixing issue of Ray with ARM...
#81
ofermend
closed
7 months ago
0
`run.sh` should check whether docker version is compatible with `buildx` command
#80
mig281
closed
7 months ago
1
updated documentation
#79
ofermend
closed
7 months ago
0
updated logic to better identify relative URLs
#78
ofermend
closed
7 months ago
0
`pos_regex` does not behave as expected for indexing Salesforce knowledge base pages
#77
mig281
closed
7 months ago
15
Next