issues
search
mediacloud
/
story-indexer
The core pipeline used to ingest online news stories in the Media Cloud archive.
https://mediacloud.org
Apache License 2.0
2
stars
5
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
indexer/worker.py: Have check_output_queues check "data" disk space.
#354
philbudne
closed
15 hours ago
2
Re-filling the feb-may 2022 "dip" using canonical URL extraction
#353
philbudne
opened
2 days ago
0
parser "hang" found in 2020 historical data
#352
philbudne
opened
6 days ago
3
Pass an client-unique "opaque_id" to Elasticsearch
#351
philbudne
closed
5 days ago
1
More ES stats, extract RSS file header date
#350
philbudne
closed
5 days ago
0
elastic-stats.py: add more gauges, incl gc, thread_pool rejects, circuit breaker trips
#349
philbudne
closed
1 week ago
0
Implement canonical domain update script
#348
m453h
closed
1 week ago
2
indexer/scripts/elastic-stats.py: report individual index stats again
#347
philbudne
closed
2 weeks ago
0
Set canonical_domain & Third try at using canonical URL from mcmetadata
#346
philbudne
closed
3 weeks ago
1
Process needed for directly editing story entries in the ES archive
#345
pgulley
opened
3 weeks ago
2
If we ever reindex...
#344
philbudne
opened
3 weeks ago
15
chore: ES reindex implementation
#343
thepsalmist
closed
2 weeks ago
3
indexer/workers/parser.py: use lxml iterparse w/ NEED_CANONICAL_URL (PROPOSAL FOR DISCUSION)
#342
philbudne
closed
4 weeks ago
0
Allow use of canonical URL extraction in historical pipelines
#341
philbudne
closed
1 month ago
0
Use _reindex api to generate a small test version of our index
#340
pgulley
opened
1 month ago
0
Deployment warnings from docker
#339
philbudne
opened
1 month ago
1
Enable Historical Re-ingest by canonical-url to cover absent download IDs
#338
pgulley
opened
1 month ago
0
More mypy settings, typings
#337
philbudne
opened
1 month ago
0
Json-to-warc utility and WARC documentation
#336
pgulley
opened
1 month ago
0
Update to metadata-lib v1.1
#335
philbudne
closed
1 month ago
0
Elastic Search query performance links/leads
#334
philbudne
opened
1 month ago
0
archiver: handle upload failures better, handle .tmp file removal
#333
philbudne
closed
2 months ago
0
indexer/workers/hist-fetcher.py: try to detect non-existent objects
#332
philbudne
closed
3 months ago
0
Silent failure on stack deployment for staging and production when volume directories do not exist
#331
philbudne
opened
3 months ago
0
A quick look at some quarantined (missing) stories from 2021 historical CSVs
#330
philbudne
opened
3 months ago
0
regenerate "historical" CSV files that don't have downloads_id
#329
philbudne
opened
3 months ago
4
hist-fetcher issues
#328
philbudne
opened
3 months ago
0
Switch to fetching RSS files from B2
#327
philbudne
closed
3 months ago
0
Change fetcher URLs from S3 to B2
#326
philbudne
closed
3 months ago
1
indexer/workers/fetcher/rss-queuer.py: add missing argument to unexpected tag warning
#325
philbudne
closed
3 months ago
0
Hist fixes for hist fixes.
#324
philbudne
closed
3 months ago
0
Generate & run RSS files for two July UMass power outages
#323
philbudne
closed
3 months ago
1
Test disabling S3 for archiver in staging
#322
philbudne
closed
3 months ago
1
Clean up Epoch D URLs submitted with Epoch B page contents
#321
philbudne
closed
3 months ago
6
hist-fetcher.py: fix date2epoch range check; disable quarantine for missing objects
#320
philbudne
closed
3 months ago
0
fetcher issues from a power outage
#319
philbudne
opened
3 months ago
0
hist-fetcher.py: fix VersionId caseo's
#318
philbudne
closed
3 months ago
1
Error while fetching server API version: Not supported URL scheme http+docker
#317
pgulley
closed
4 months ago
5
Close all write operations to S3
#316
pgulley
closed
3 months ago
3
Fix three different problems with programs losing RabbitMQ connection
#315
philbudne
closed
4 months ago
0
Chore/setup es blobstore credentials
#314
thepsalmist
opened
4 months ago
5
add script to set blobstore credentials
#313
thepsalmist
closed
4 months ago
0
Move other backups to B2
#312
pgulley
closed
4 months ago
0
Audit Sentry
#311
pgulley
opened
4 months ago
0
More backups to back blaze
#310
pgulley
closed
4 months ago
0
Migrate ILM Backups to Back Blaze
#309
pgulley
closed
3 months ago
3
Validate backup strategies
#308
pgulley
opened
4 months ago
9
2022 Historical Re-ingestion (NSF 3.1.1)
#307
pgulley
closed
4 months ago
1
2021 Historical Re-ingestion (NSF 3.1.1)
#306
pgulley
closed
4 months ago
0
Indexer stack prefixes can collide in stats
#305
pgulley
closed
5 months ago
3
Next