mikkelkrogsholm / postlightmercury

R wrapper for the postlight mercury web parser
Other
1 stars 0 forks source link

strange for-loop behavior #1

Open theiman112860 opened 6 years ago

theiman112860 commented 6 years ago

Hi, Thank you for the awesome package!! I have a list of urls with specific articles that I would like to use for a data mining project. I used the following code: urls <- read.csv("~/urls.txt", sep="")

library(postlightmercury) for(i in 1:nrow(urls)) { df <- web_parser( page_urls = operational$URL[5], api_key = 'WaGZo87FNlGJEhJWv0f9fzAhwyoxqGjxuYSZyukT')

urls.txt

df$content<-remove_html(df$content) df<-null_to_na(df)

df<-rbind(operational99[i,2],df)

Data <- cbind(Data,df) }

urls.txt

The output I get is:

  title author date_published dek lead_image_url content next_page_url url excerpt Word count
1 News & media NA NA NA NA Contact us Site map Privacy policy Accessibility NA http://www.btplc.com/news/index.htm#/pressreleases/bt-and-symantec-partner-to-provide-best-in-class-endpoint-security-protection-2326712?utm_source=rss&utm_medium=rss&utm_campaign=Subscription&utm_content=current_news Contact us Site map Privacy policy Accessibility 7
2 NA NA NA NA NA NA NA NA NA 638
3 NA NA NA NA NA NA NA NA NA 526
4 NA NA NA NA NA NA NA NA NA 238
5 NA NA NA NA NA NA NA NA NA 418

BT and Symantec partner to provide best-in-class endpoint security protection

Apple buys music-recognition app Shazam M&A: Stefanini buys Gauge for user experience tech Microsoft appoints new Country Manager for T&T

Any idea on what I am doing wrong? I also tried adding a Sys.sleep to see if I was doing too fast.. Thank you!! Sincerely, tom

urls.txt

theiman112860 commented 6 years ago

page_urls = operational$URL[5] should be

page_urls = operational$URL[i]

Underneath the table are the titles that I should be getting..

theiman112860 commented 6 years ago

Sorry, I found a stupid error in the code. The correct code (It still produces weird results) is:

urls <- read.csv("~/urls.csv", sep="") library(postlightmercury) for(i in 1:nrow(urls)) { df <- web_parser( page_urls = urls$URL[i], api_key = 'WaGZo87FNlGJEhJWv0f9fzAhwyoxqGjxuYSZyukT')

print(i)

df$content<-remove_html(df$content) df<-null_to_na(df) Data <- rbind(Data,df) }