lschmelzeisen / nasty

NASTY Advanced Search Tweet Yielder
Apache License 2.0
49 stars 9 forks source link

Unkown entry type in entry-ID #7

Closed lexwinko closed 4 years ago

lexwinko commented 4 years ago

I'm trying to get tweets from the hashtags '#COVID2019' and '#CoronavirusFrance', both return the following RuntimeError: "Unknown entry type in entry-ID '{}'.".format(entry["entryId"]) RuntimeError: Unknown entry type in entry-ID 'novel_coronavirus_message'.

I'm using a simple python request for these tweets nasty.Search(hashtag, lang="en").request() but using the cmd version returns the same error nasty search --query "#COVID2019" --lang "en"

I assume it's the automated twitter warning that shows up when you search for anything corona related. corona

Is there a way to skip it?

lschmelzeisen commented 4 years ago

Thanks for reporting! I already implemented a workaround for this message in 5891e323690c05938ad390e5c5f624d1337d2923. I realize I should push this as a new version to pip.

lschmelzeisen commented 4 years ago

Just published a new version v0.2.1. Make sure you upgrade to it via

pip install nasty==0.2.1
tilmanbeck commented 4 years ago

I got the same error, I guess it is due to a small change in the name of the entryId on behalf of Twitter:

"Unknown entry type in entry-ID '{}'.".format(entry["entryId"])
RuntimeError: Unknown entry type in entry-ID 'novel_coronavirus_msg'.
lschmelzeisen commented 4 years ago

@tilmanbeck To make it easy for me to fix this: would you have a short example call (via CLI or Python) that reproduces the error?

tilmanbeck commented 4 years ago

Sure, I ran the following command: nasty batch --batch-file batch.jsonl --results-dir out/

with batch.jsonl as

{"id": "0ffcdc1d4860470dba69588ce2d24657", "request": {"type": "Search", "query": "corona", "since": "2019-12-01", "until": "2020-03-16", "filter": "LATEST", "lang": "en", "max_tweets": null}}

{"id": "1c5362c6e2a943dc83b029bb7916a118", "request": {"type": "Search", "query": "coronavirus", "since": "2019-12-01", "until": "2020-03-16", "filter": "LATEST", "lang": "en", "max_tweets": null}}

Be aware that this might crawl a huge number of tweets ;-)

lschmelzeisen commented 4 years ago

Just published v0.2.2 that should address this.

lschmelzeisen commented 4 years ago

@tilmanbeck and anyone else interested in retrieving Tweets about the coronavirus: I am currently assembling a dataset about that. If you are interested, please follow issue #8.