issues
search
mediacloud
/
metadata-lib
How Media Cloud approaches extracting metadata from online news stories
Apache License 2.0
12
stars
3
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
MC metadata extraction investigation
#87
pgulley
closed
1 month ago
0
Assess tweaks to content extraction to remove headlines at end of article
#86
rahulbot
opened
4 months ago
2
Update htmldate requirement from ==1.7.* to >=1.7,<1.9
#85
dependabot[bot]
closed
5 months ago
1
Update trafilatura requirement from <1.7,>=1.4 to >=1.4,<1.9
#84
dependabot[bot]
closed
5 months ago
1
Further tweaking of User-Agent string?
#83
philbudne
closed
6 months ago
3
central storage for User-Agent to use across MC projects
#82
rahulbot
closed
6 months ago
1
store MC user-agent for use by our other libraries
#81
rahulbot
closed
6 months ago
0
Not capturing full article text
#80
jaypinho
closed
6 months ago
1
Update trafilatura requirement from <1.7,>=1.4 to >=1.4,<1.8
#79
dependabot[bot]
closed
5 months ago
1
Get automated release working
#78
rahulbot
closed
3 months ago
2
ignore ports & handle IP domains in `normalize_url`
#77
rahulbot
closed
7 months ago
0
Update requirements
#76
rahulbot
closed
7 months ago
1
Update htmldate requirement from ==1.6.* to >=1.6,<1.8
#75
dependabot[bot]
closed
7 months ago
2
Fix title parsing failure (due to empty or whitespace title tag)
#74
rahulbot
closed
7 months ago
1
mcmetadata.extract throwing AttributeErrors
#73
philbudne
closed
7 months ago
3
possible url normalization issues
#72
philbudne
opened
8 months ago
1
Update static test fixtures
#71
rahulbot
closed
8 months ago
0
centralize url unique hash generation with helper method in this package
#70
rahulbot
closed
8 months ago
1
improve CI test run reliabiility by using cached fixtures?
#69
rahulbot
closed
8 months ago
0
allow capturing stats from individual extract calls
#68
rahulbot
closed
9 months ago
0
May want to remove story source related query parameters!
#67
philbudne
closed
9 months ago
1
update requirements file to latest
#66
rahulbot
closed
9 months ago
0
Small tweaks to handle whitespace in URLs
#65
rahulbot
closed
9 months ago
0
Support defaults and overrides in `extract`
#64
rahulbot
closed
9 months ago
0
support passing in a fallback publication date
#63
rahulbot
closed
9 months ago
2
Update htmldate requirement from ==1.5.* to >=1.5,<1.7
#62
dependabot[bot]
closed
9 months ago
2
Discuss possible enhancements to mcmetadata.extract
#61
philbudne
closed
9 months ago
2
Update dateparser requirement from ==1.1.* to >=1.1,<1.3
#60
dependabot[bot]
closed
9 months ago
2
Update tldextract requirement from ==3.6.* to >=3.6,<5.2
#59
dependabot[bot]
closed
9 months ago
2
Handling of URL parse failure
#58
philbudne
closed
9 months ago
0
Update tldextract requirement from ==3.6.* to >=3.6,<5.1
#57
dependabot[bot]
closed
10 months ago
1
Update tldextract requirement from ==3.4.* to >=3.4,<3.7
#56
dependabot[bot]
closed
11 months ago
1
Update tldextract requirement from ==3.4.* to >=3.4,<3.6
#55
dependabot[bot]
closed
11 months ago
1
Update htmldate requirement from ==1.4.* to >=1.4,<1.6
#54
dependabot[bot]
closed
11 months ago
1
Switched from cchardet to faust-chardet, as the former is unmantained…
#53
pgulley
closed
1 year ago
0
mcmetadata not type checked by mypy
#52
philbudne
closed
1 year ago
2
Update trafilatura requirement from ==1.4.* to >=1.4,<1.7
#51
dependabot[bot]
closed
1 year ago
0
update to latest version of trafilatura
#50
rahulbot
closed
9 months ago
1
Update trafilatura requirement from ==1.4.* to >=1.4,<1.6
#49
dependabot[bot]
closed
1 year ago
1
Update beautifulsoup4 requirement from ==4.11.* to >=4.11,<4.13
#48
dependabot[bot]
closed
1 year ago
0
fix bugs from PT integration
#47
rahulbot
closed
1 year ago
0
addressing no nk error
#46
pgulley
closed
9 months ago
1
Crash because uri.query.params['nk'] can be None
#45
vbanos
closed
9 months ago
2
Feature feed normalization
#44
rahulbot
closed
1 year ago
0
Add feed_url.py
#43
philbudne
closed
1 year ago
0
handle IP addresses better
#42
rahulbot
closed
1 year ago
1
Add a a check to avoid TypeError
#41
vbanos
closed
1 year ago
1
Update htmldate requirement from ==1.3.* to >=1.3,<1.5
#40
dependabot[bot]
closed
1 year ago
0
Update trafilatura requirement from ==1.3.* to >=1.3,<1.5
#39
dependabot[bot]
closed
1 year ago
0
Update tldextract requirement from ==3.3.* to >=3.3,<3.5
#38
dependabot[bot]
closed
1 year ago
0
Next