Closed xingos123 closed 2 years ago
my code is:
` df = pd.read_csv("politifact_fake.csv")[['id', 'title']] df = pd.DataFrame(df)
for index, dfrow in df.iterrows(): id = dfrow['id'] query = str(dfrow['title'])
data = [['query', 'domain', 'URL', 'title', 'text']]
path = id + '.csv'
for j in eng.search(query, pages=3):
row = [
query, j['host'], j['link'], j['title'], j['text']
]
row = [encoder(il) for il in row]
data.append(row)
output.write_file(data, path)
time.sleep(random.randint(2, 7))
`
Yes, results are stored in a SearchEngine.results
object and every time you call .search()
you append more items there. Can't you just create a new eng
instance for every iteration of your outer for loop?
for index, dfrow in df.iterrows():
eng = Google()
...
🆗,thanks for your answer, it help a lot.
@tasos-py and I found that Bing couldn't get the title correctly, and after analyzing the page, I made the following changes:
engines/bing.py line16
'title': 'a'->'title': 'h2'
hope can help others.
Thanks, much appreciated! However, I don't have any issues getting title from the a
tag. And in the HTML I see that the title text is inside the a
tag, which is a child of h2
, eg
So, in this case a
and h2
should have the same text.
Maybe we're getting different HTML based on our location or maybe it's a BS4 version thing. Could you give me an example of the HTML you see and your BS4 version?
I updated Bing
accordingly, because I don't see no harm only benefits, but I'd like to know what's causing this issue.
@tasos-py bs4version--4.9.1,location--china if i do not change, it will be ['', '', '', '', '', '', ''], as following:
Strange. Our HTML is identical and I don't see any reason for a
not to have text, since the text content is placed directly in the a
tag. Maybe it's because we're using different BS4 versions - I'm using v4.8.1. Either way, I've implemented the changes you suggested. Thanks again!
When I run multiple statement cyclic queries, the result of the next query will contain the previous one. How i can clear the previous one?