issues
search
rivernews
/
review-scraper-java-development-environment
An environment to develop review scraper
0
stars
1
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Bump gson from 2.8.6 to 2.8.9 in /shaungc-java-dev
#35
dependabot[bot]
opened
2 years ago
0
Bump junit from 4.11 to 4.13.1 in /shaungc-java-dev
#34
dependabot[bot]
opened
4 years ago
0
implement job splitting; add env var; add stop at page mechanism; ren…
#33
rivernews
closed
4 years ago
0
064 reduce s3 cost
#32
rivernews
closed
4 years ago
1
test running scraper in container
#31
rivernews
closed
4 years ago
0
Slk 053 k8 job memory pressure
#30
rivernews
closed
4 years ago
0
Report progress data even if scraper exception / halt exception occured
#29
rivernews
opened
4 years ago
0
Add 2nd approach for capturing next page link
#28
rivernews
closed
4 years ago
0
026 dataintegrity review helpfulcount
#27
rivernews
closed
4 years ago
0
Data Integrity
#26
rivernews
closed
4 years ago
3
Data Pipeline: aggregation work as cronjob across all orgs
#25
rivernews
opened
4 years ago
0
Data pipeline: cronjob-kind-of mechanism to track each org over time
#24
rivernews
closed
4 years ago
2
Data pipeline: review data missing. scraper ended successfully, but processed review count << local review count
#23
rivernews
closed
4 years ago
2
Data Pipeline: cache webpage within certain time frame like 3 days
#22
rivernews
opened
4 years ago
0
Data Archive: DEFAULT s3 ACL to None
#21
rivernews
closed
4 years ago
1
Data Pipeline: add company as prefix in slack log message
#20
rivernews
closed
4 years ago
1
Data Pipeline: add more robust info in slack message
#19
rivernews
closed
4 years ago
1
Data Pipeline: Turn off DEBUG logging in production to improve performance
#18
rivernews
closed
4 years ago
1
Data Pipeline: Consider using filter by engineering
#16
rivernews
opened
4 years ago
1
Consider supporting cross-session mechanism, or move to our own cloud
#17
rivernews
closed
4 years ago
2
Data Pipeline: data integrity: potential duplicated review
#15
rivernews
closed
4 years ago
3
Data Pipeline: compile java and only use jar in travis to improve spin up time
#14
rivernews
closed
4 years ago
1
Data Pipeline: Middleware Server for better Slack Trigger
#13
rivernews
closed
4 years ago
2
#003 Scraper and Data Archive Mechanism
#12
rivernews
closed
4 years ago
0
NLP analysis: research theme exploring
#11
rivernews
opened
4 years ago
1
Archive: S3 cannot delete buckets / objects
#10
rivernews
closed
4 years ago
1
Scrape: escape HTML text
#9
rivernews
opened
4 years ago
0
Pipeline: Emit scraper special events & logs to some message broker
#8
rivernews
closed
4 years ago
1
Pipeline: Scale the scraper to accept multiple companies input
#7
rivernews
closed
4 years ago
1
Pipeline: Capture scraping duration
#6
rivernews
closed
4 years ago
1
Emit log info from scraper
#5
rivernews
closed
4 years ago
1
Pipeline: Estimate maximum capacity for a build in travisci
#4
rivernews
opened
4 years ago
2
Storage | Archive: Design data storage
#3
rivernews
closed
4 years ago
4
Add pagination scrape logic
#2
rivernews
closed
4 years ago
0
Deploy steamlined process in TravisCI
#1
rivernews
closed
4 years ago
1