lschmelzeisen / nasty

NASTY Advanced Search Tweet Yielder
Apache License 2.0
49 stars 9 forks source link

Retrieve replies for several tweet-IDs #15

Closed ana-sofia93 closed 4 years ago

ana-sofia93 commented 4 years ago

For a sentiment analysis in the context of my academic master thesis, I use the really useful tool 'nasty' to crawl several company tweets within a certain period of time (with the search command) and the users' replies to them (with the reply command).

The search command returned several tweets for each company, i.e. many Tweet-IDs, for which I now have to retrieve the respective replies. Is there a way to crawl the answers to multiple Tweet-IDs / a predefined list of Tweet-IDs at once with the 'nasty reply' command? I guess a loop might solve my problem. However, since I am a marketer but not a computer scientist, I hope for a more convenient way to get the replies for more than one Tweet-ID.

Thanks in advance for any helpful suggestions.

lschmelzeisen commented 4 years ago

Yes, this is possible (sort of). While with the replies command you will only be able to get the replies of a single Tweet, you can store multiple reply requests in a batch file and run them all at once. For example:

nasty replies --tweet-id 257552283850653696 --to-batch batch.jsonl
nasty replies --tweet-id 266259787405225984 --to-batch batch.jsonl
nasty replies --tweet-id 332308211321425920 --to-batch batch.jsonl

The above will create a file batch.jsonl that stores all requests for replies. Afterwards you can then run them all at once with:

nasty batch --batch-file batch.jsonl --results-dir out/

This will retrieve all requests replies and put them into a folder out/.

Of course this is also possible via the Python API.

1) To create the batch file:

```python
from pathlib import Path
import nasty

tweet_ids = {"257552283850653696", "266259787405225984", "332308211321425920"}
batch = nasty.Batch()
for tweet_id in tweet_ids:
    batch.append(nasty.Replies(tweet_id))
batch.dump(Path("batch.jsonl"))
```

2) To run all requests in the batch file:

```python
batch = nasty.Batch()
batch.load(Path("batch.jsonl"))
results = batch.execute(Path("out/"))
```

3) To access the results:

```python
results = nasty.BatchResults(Path("out/"))
for entry in results:
    print("Tweets that matched query '{}' (completed at {}):"
          .format(entry.request.query, entry.completed_at))
    for tweet in results.tweets(entry):
        print("-", tweet)
```

Naturally, you will have to control yourself where the Tweet-IDs come from. Currently, there is no way to automatically take all Tweet-IDs from somewhere, e.g., the results of a previously completed search request.

Does this answer your question?