issues
search
mediacloud
/
story-indexer
The core pipeline used to ingest online news stories in the Media Cloud archive.
https://mediacloud.org
Apache License 2.0
2
stars
5
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
automate backing up of "warm" ES indexes created by ILM
#254
rahulbot
closed
7 months ago
0
manually back up first "warm" index after first ILM rollover
#253
rahulbot
closed
8 months ago
5
Fix/update es mapping
#252
thepsalmist
closed
9 months ago
2
assess redirect volume in historical data
#251
rahulbot
closed
8 months ago
3
Fix/update ilm policy
#250
thepsalmist
closed
8 months ago
0
Queuer class logging/stats update etc
#249
philbudne
closed
9 months ago
0
indexer/workers/hist-queuer.py: only queue a particular URL once per CSV file
#248
philbudne
closed
9 months ago
0
what timezone is `indexed_date` returned/interpreted as?
#247
rahulbot
closed
9 months ago
3
Replace temporary hack for UDP socket sendto speedup
#246
philbudne
closed
9 months ago
0
Bump scrapy from 2.10.1 to 2.11.1
#245
dependabot[bot]
closed
9 months ago
1
update user-agent for consistency and slightly better compatability
#244
rahulbot
closed
8 months ago
4
delete unnecessary ES index fields
#243
rahulbot
closed
9 months ago
4
will ILM rollover make unique-key check fail?
#242
rahulbot
closed
8 months ago
4
Ft/docker scripts prune
#241
thepsalmist
closed
9 months ago
3
Add docker/docker-init.sh script
#240
philbudne
closed
9 months ago
0
Updates for production historical file processing
#239
philbudne
closed
9 months ago
0
Possible enhancements for --from-quarantine
#238
philbudne
opened
9 months ago
0
Limit text_content to Lucene term byte length
#237
kilemensi
closed
9 months ago
6
Queue based fetcher
#236
philbudne
closed
9 months ago
0
evaluate new ES-ILM backup / redundancy strategy
#235
rahulbot
closed
9 months ago
3
Final(?) tweaks for processing legacy (CSV) stories for 2023
#234
philbudne
closed
9 months ago
3
Story sub-objects not being type-checked by mypy???
#233
philbudne
opened
9 months ago
11
Fix TEMPORARY HACK in indexer/app.py for resolving stats and logging host names
#232
philbudne
closed
9 months ago
0
Past and future ES index sizes
#231
philbudne
opened
10 months ago
0
set uniform index naming to mc_search
#230
thepsalmist
closed
10 months ago
0
Comments/documentation on metadata field data format, source, semantics, use?
#229
philbudne
closed
9 months ago
3
Sentry alert from indexer.pipeline process when shutting down stack
#228
philbudne
opened
10 months ago
0
result of falling into importer counter rabbit hole.
#227
philbudne
closed
10 months ago
1
update elastic-stats for ILM
#226
philbudne
closed
10 months ago
0
changes for testing with elasticsearch 8.12
#225
philbudne
closed
10 months ago
0
Elasticsearch INDEX_NAMES is depreciated on NewsSearchApi currently, might want to repreciate after the database update
#224
pgulley
closed
9 months ago
2
fix: generate ES id from mcmetadata unique_url_hash
#223
thepsalmist
closed
10 months ago
2
historical ingest: use language from CSV files?
#222
philbudne
closed
9 months ago
1
Add --from-archive command line argument to StoryWorker class? --to-archive to StoryApp???
#221
philbudne
closed
6 months ago
0
harden StoryArchiveReader???
#220
philbudne
opened
10 months ago
0
Add "parsed_date" field to content metadata
#219
philbudne
closed
10 months ago
1
enable flake8 unused imports (F401), and remove usused imports
#218
philbudne
closed
10 months ago
0
update to latest news-search-api
#217
rahulbot
closed
9 months ago
3
support story "auditing" trails?
#216
rahulbot
closed
9 months ago
3
approaches to future proofing / testing Story object?
#215
rahulbot
opened
10 months ago
0
Upgrade to latest mcmetadata
#214
philbudne
closed
10 months ago
0
ES indexed_date field when processing WARC files, and historical data
#213
philbudne
closed
10 months ago
6
update RSS date
#212
thepsalmist
closed
10 months ago
1
Ft/elasticsearch ilm
#211
thepsalmist
closed
10 months ago
6
update to latest mediacloud-metadata
#210
rahulbot
closed
10 months ago
2
archive pipeline fixes
#209
philbudne
closed
10 months ago
0
pub date falls back to rss_fetcher date when missing- skipping mypy p…
#208
pgulley
closed
10 months ago
2
Infrastructure for queuing/processing historical data and archives, and queue-based fetcher
#207
philbudne
closed
10 months ago
1
where to capture system performance and needs documentation?
#206
rahulbot
closed
7 months ago
2
rearchitect indexes to support archiving?
#205
rahulbot
closed
10 months ago
6
Previous
Next