hupili / python-for-data-and-media-communication-gitbook

An open source book on Python tailed for communication students with zero background
115 stars 62 forks source link

prevent feedparser overwrite when scrape rss feed #154

Closed ivywze closed 5 years ago

ivywze commented 5 years ago

Troubleshooting

Describe your environment

Describe your question

when using feedparser for an element with the same name within one entry, the later one seems to overwrite the former one:

Screen Shot 2019-07-18 at 11 24 18 PM Screen Shot 2019-07-18 at 11 34 39 PM

What is the closest answer you can find?

Accessing multiple links -- official doc

But in my case, the entry itself just print one item and ignore the other one.

hupili commented 5 years ago

@ivywze , can you share a partial notebook so we can follow up from where you left?

MindyZHAOMinzhu commented 5 years ago

I cannot find the same website as Ivy had. Are there any problem with the original format of the website code?

ConnorLi96 commented 5 years ago

Sorry for reply late, my VPN is failed so cannot access the website.

I guess you can try to use CSS and Xpath to parse it layer by layer, this Parse is prasing the whole page, maybe ignore part of tags, and I guess show more code would help us to deal with this problem soon.

ivywze commented 5 years ago

Thanks all for the effort. @hupili sorry but that's all the code I have regarding this problem. @MindyZHAOMinzhu I think the rss feed format is like this. @ConnorLi96 thanks, I was just trying out that way.

I will just use another approach, thanks again.

hupili commented 5 years ago

@ivywze how did you solve the original problem?

p.s. That may help future readers.

ivywze commented 5 years ago

@ivywze how did you solve the original problem?

p.s. That may help future readers.

Sorry, I didn't solve the problem. I just used other ways to scrape.