osome-iu / botometer-python

A Python API for Botometer by OSoMe
https://botometer.osome.iu.edu
MIT License
371 stars 59 forks source link

No barebones/generic Python wrapper? + API limits + exclude calculations #32

Closed keikoro closed 4 years ago

keikoro commented 5 years ago

Hi, my research into bot detection led me here – well, actually, your website; it unfortunately took me forever to actually find your GitHub (your usage of "repository" on the overview site combined with not actually linking to here from the "Tools" page is confusing).

Having had a look at your API, and having skimmed the code in here, I'm a bit confused about... well the structure of the API itself as well as your usage of "Python API" (or "our official Python client libary" on the website) for what's presented here. From what I can see, what you are doing here is to mostly use Tweepy to get info from Twitter, which you eventually send off to the Botometer API. Which I guess is nice and convenient for people not used to working with Twitter, but not equally helpful when you are already using (and intending to stick to) a different Python library to connect to the Twitter API. So basically I'm wondering if you are going to offer a simple, library agnostic wrapper as well?

Relatedly – because all the data is collected beforehand anyway – I was wondering what the API's actual limits are, or what JSON structure the code on your end expects. How many tweets can I feed into your API? Which of they key/value pairs of tweet objects need to be present for your code to be able to interpret them as such? I imagine you don't make use of all the (meta) data points in full tweet objects, so it might make sense to limit queries to what's absolutely necessary (... particularly if that then means I can send more tweet objects over to be examined).

This also ties into my next question: what about an option to exclude individual calculations/scores to reduce the number of calculations needed to be done on your side, which I assume would in turn speed up things on the client side, for irrelevant scores? Are there any plans for that, or any plans to make it possible to only query some scores? This is actually more what I'd expect of an API: to also offer finer grained queries/splitting of requests. I'd find this particularly relevant for the analysis of non-English tweets – the sentiment analyis seems to play an important role in your calculations, but is of course completely redundant for non-English tweets. It would generally be nice if it were possible to exclude calculations because I find the (continued) significance of some of them unclear or questionable.

With regard to that last bit: are there any plans to also document the calculations you are doing in any way outside of the research papers you reference? I'm asking because not all these resources are freely available, and to have to read through them all and then try to piece together/make an educated guess about what is currently/still being used as basis for the calculations – nevermind how they are actually done – is... well, not ideal. As said, I'd have to ditch some calculations and redo them incorporating missing bits.

I'd generally be interested to know if the project is still being worked on, or being developed further, though discussion of that is probably better saved for another channel.

clayadavis commented 4 years ago

You're welcome to use the RapidAPI endpoint without this library. Many people do. The documentation there describes most of the requirements and limits you're asking about.

If you only want a subset of features, just use the subscores for the features you want. There aren't plans to make this API more fine-grained because right now we don't have the person-power. Do note that the "content" and "sentiment" features are not used for calculating the "universal" score; this is specifically designed for non-English content.

We believe our publications describe these algorithms in enough detail and there aren't plans at this point to further expand on those. Again, person-power. Sometimes free resources are assembly-required.

I'm still maintaining this library but graduated from IU and left the research project. I don't particularly know their plans going forward, but feel free to contact the team if you have further questions.