mediacloud metadata-lib issues

mediacloud / metadata-lib

How Media Cloud approaches extracting metadata from online news stories

Apache License 2.0

12 stars 5 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Update trafilatura requirement from ==1.4.* to >=1.4,<1.7

#51 dependabot[bot] closed 1 year ago
0
update to latest version of trafilatura

#50 rahulbot closed 1 year ago
1
Update trafilatura requirement from ==1.4.* to >=1.4,<1.6

#49 dependabot[bot] closed 1 year ago
1
Update beautifulsoup4 requirement from ==4.11.* to >=4.11,<4.13

#48 dependabot[bot] closed 1 year ago
0
fix bugs from PT integration

#47 rahulbot closed 1 year ago
0
addressing no nk error

#46 pgulley closed 1 year ago
1
Crash because uri.query.params['nk'] can be None

#45 vbanos closed 1 year ago
2
Feature feed normalization

#44 rahulbot closed 1 year ago
0
Add feed_url.py

#43 philbudne closed 1 year ago
0
handle IP addresses better

#42 rahulbot closed 1 year ago
1
Add a a check to avoid TypeError

#41 vbanos closed 1 year ago
1
Update htmldate requirement from ==1.3.* to >=1.3,<1.5

#40 dependabot[bot] closed 2 years ago
0
Update trafilatura requirement from ==1.3.* to >=1.3,<1.5

#39 dependabot[bot] closed 2 years ago
0
Update tldextract requirement from ==3.3.* to >=3.3,<3.5

#38 dependabot[bot] closed 2 years ago
0
assess fasttext for language guessing speedup

#37 rahulbot closed 12 months ago
1
upgrade dependencies

#36 rahulbot closed 2 years ago
3
Fallback extractor

#35 pgulley closed 2 years ago
0
handle empty content with no-encoding from HTML

#34 rahulbot closed 2 years ago
0
Unexpected AttributeError on extract

#33 vbanos closed 2 years ago
1
Improvement regarding content decoding/encoding

#32 vbanos opened 2 years ago
1
Bug in extract method

#31 vbanos closed 2 years ago
1
Use latest htmldate and pass datetime max_date instead of string

#30 vbanos closed 2 years ago
0
add in top image and other metadata

#29 rahulbot closed 2 years ago
2
More efficient parameterized unit tests

#28 vbanos closed 2 years ago
1
optimization on tag removal in readability-lxml extraction fallback

#27 rahulbot closed 2 years ago
0
improve trafilatura defaults

#26 rahulbot closed 2 years ago
0
create larger test set to compare results to main system data

#25 rahulbot closed 2 years ago
1
don't lowercase YouTube URLs for uniqueness hashing

#24 rahulbot closed 2 years ago
0
limit dates in future?

#23 rahulbot closed 2 years ago
2
Masking very frequent date parsing exceptions

#22 vbanos closed 2 years ago
1
Unhandled exception we got in production

#21 vbanos closed 2 years ago
4
centralize dependencies in one place

#20 rahulbot closed 2 years ago
0
You could also compile these regex in this method.

#19 vbanos closed 2 years ago
0
Use set instead of list for improved performance

#18 vbanos closed 2 years ago
0
You could compile this regex for better performance

#17 vbanos closed 2 years ago
0
Use Beautifulsoup4 with lxml parser for faster performance

#16 vbanos closed 2 years ago
0
Add cchardet dependency to speedup BeautifulSoup4

#15 vbanos closed 2 years ago
0
investigate URLs failing extraction

#14 rahulbot closed 2 years ago
2
justify content extractor priorities with data and testing

#13 rahulbot closed 1 year ago
3
Feature quick improvements

#12 rahulbot closed 2 years ago
0
Stats for the success / failure of each extractor

#11 vbanos closed 2 years ago
0
Improve exception handling

#10 vbanos closed 2 years ago
1
Compile regular expressions to improve performance

#9 vbanos closed 2 years ago
0
rename core branch from master to main

#8 rahulbot closed 2 years ago
1
Prep for release to PyPi

#7 rahulbot closed 2 years ago
2
Extract authors information when possible

#6 ibnesayeed closed 1 year ago
3
Building and installing cld2-cffi is failing

#5 ibnesayeed closed 2 years ago
2
Extracting original domain from archived pages

#4 ibnesayeed closed 2 years ago
1
Exception on non-news article pages

#3 ibnesayeed closed 2 years ago
0
switch language detection for now

#2 rahulbot closed 2 years ago
0

Previous Next