issues
search
news-scraper
/
news_scraper
Simple ETL news scraper in Ruby
MIT License
4
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add more techniques for parsing
#38
jules2689
closed
8 years ago
0
Add test coverage, make test coverage 100%
#37
jules2689
closed
8 years ago
0
Configuration proc
#36
jules2689
closed
8 years ago
0
Error user agent
#35
jules2689
closed
8 years ago
0
Uri to url
#34
jules2689
closed
8 years ago
0
Nokogiri functions
#33
jules2689
closed
8 years ago
0
Split TrainerArticle up better
#32
jules2689
closed
8 years ago
0
Yield transformer not defined error if block given
#31
jules2689
closed
8 years ago
0
Configuration
#30
jules2689
closed
8 years ago
5
Remove unused option in google news rss
#29
jules2689
closed
8 years ago
1
Removed html file extension from fixtures
#28
richardwu
closed
8 years ago
0
Touch up docs 2
#27
jules2689
closed
8 years ago
0
Touch up wording and formatting of docs
#26
jules2689
closed
8 years ago
1
Fix file paths in constants
#25
jules2689
closed
8 years ago
2
Separate Transformers::Article into Scraper and Trainer
#24
richardwu
closed
8 years ago
3
Some docs
#23
jules2689
closed
8 years ago
2
Add uri to transformed data
#22
richardwu
closed
8 years ago
1
updated README with instructions for scraping and training
#21
richardwu
closed
8 years ago
3
Rake task scraper:train for Trainer
#20
richardwu
closed
8 years ago
1
Scraper class
#19
richardwu
closed
8 years ago
2
Only require `uri` when absolutely necessary; otherwise accept url (uri is a subset of url)
#18
richardwu
closed
8 years ago
1
File paths need to load from __FILE__
#17
jules2689
closed
8 years ago
0
test/unit/trainers -> test/unit/trainer to match module
#16
richardwu
closed
8 years ago
1
Maintain YAML structure during training
#15
jules2689
closed
8 years ago
3
Add tests for CSS and Xpaths to make sure they're valid
#14
jules2689
closed
8 years ago
1
Author Scrapers
#13
jules2689
closed
8 years ago
0
RSS feed items
#12
jules2689
closed
8 years ago
3
Add more data patterns, don't overwrite existing domains, dont offer blank patterns
#11
jules2689
closed
8 years ago
1
Use readability gem to extract body text
#10
jules2689
closed
8 years ago
0
Add base rubocop yml
#9
jules2689
closed
8 years ago
2
Convert everything to symbol keys
#8
richardwu
closed
4 years ago
7
Trainer refactor into succinct, isolated flow, presets
#7
jules2689
closed
8 years ago
3
Refactored URIParser and Extractors::GoogleNewsRss
#6
richardwu
closed
8 years ago
1
Trainers
#5
jules2689
closed
8 years ago
7
Keep scheme from Google News RSS
#4
richardwu
closed
8 years ago
0
URIParser and removed `transformer.transform` from trainer (as well as refactors)
#3
richardwu
closed
8 years ago
2
Scrape patterns caching
#2
jules2689
closed
8 years ago
1
Modularize the trainer, cli prettiness
#1
jules2689
closed
8 years ago
2