taspinar / twitterscraper

Scrape Twitter for Tweets
MIT License
2.41k stars 579 forks source link

Perhaps #112

Open hrbrmstr opened 6 years ago

hrbrmstr commented 6 years ago

At least — somewhere prominent — inform potential users that this is a violation of Twitter's ToS and also robots.txt (which has been held as a valid technical control in U.S. Federal circuit court).

I'd also be careful since development of this and encouragement of its use could be construed as a CFAA violation.

But putting folks into harm's way without telling them is not cool.

ashgreat commented 6 years ago

If a significant users of this tool are from the US then it makes sense to make the legal implications explicit.

hrbrmstr commented 6 years ago

That's pretty myopic and incorrect. The US is far from only enforcer of these statues.

I'm also saddened at your lack of acknowledgment of the ethical issues associated with this package but I'm unfortunately getting all-too used to the data science community putting personal, perceived utility over ethics these days.

ashgreat commented 6 years ago

You don't explain how what I wrote is myopic and incorrect. Here is a Wikipedia page (https://en.wikipedia.org/wiki/Web_scraping#United_States) which shows that web scraping laws in the US are pretty much evolving. Furthermore, this code scrapes Twitter without logging in, which as the current court ruling stands, is not breaking any laws (https://regmedia.co.uk/2017/08/14/hiqlinkedintro.pdf). Yet, as US laws are evolving, I think that it's better to warn users about this issue. If you can show me unequivocal legal frameworks in other countries that are against scraping websites without logging in, I will support giving warning to the people from those countries too.

From the second part of your comment it seems that you believe that your ethical code should be the universally binding set of ethics for data scientists and computer scientists (btw, IEEE strongly opposed LinkedIn's position in the above law suite). It's possible that people have differing sets of ethics and not everyone likes what others believe in. I have been following you on Twitter for quite some time and I greatly appreciate your work in the R community. But imposing your views on others aggressively and berating them in public is not what I was expecting from you.

taspinar commented 6 years ago

Hi Bob, Thank you for the concerns you have raised. It has been discussed before in issue https://github.com/taspinar/twitterscraper/issues/60 but because the legality of scraping websites in general and twitter.com in particular is not that clear cut and strongly depends on the country the users are living in I have not done anything about it. Furthermore, as far as i know, most of the users are scraping twitter for academical and/or research purposes, without any commercial incentives.

As it was also pointed out in issue #60 , although this package violates Twitter TOS, scraping a public website does not require people to accept the TOS. Furthermore, robots.txt allows for scraping of /search?q=%23, which is what twitterscraper does. The only clear violation I have seen is the time-delay of 1 sec between successive requests. When I have time, I'll try to implement the option to add a time-delay of 1 sec between successive requests people can turn on with a command line argument. Users for whom scraping time is not important could turn that option on.

In the mean time, feel free to sent in an PR with an added section to the readme file where this issue is explained.

maelle commented 6 years ago

Users for whom scraping time is not important could turn that option on.

Just wondering why respecting the delay would not be the default? 🤔