tapilab / protest

analyze brazilian protests on Twitter
0 stars 0 forks source link

Get more users #16

Open aronwc opened 9 years ago

aronwc commented 9 years ago

For each of the 291 Streaming Users, collect all other mentioned users.

If this set is not too large, collect all the tweets of these new users. With this new data, we will

ElaineResende commented 8 years ago

We have 85964 total and unique mentions... I am collecting just a sample of them, I thought 600. What do you think? Right now, I already have 400 collected.

aronwc commented 8 years ago

I think a bit more would be needed. We also want to make sure we have some for each of the original 291 Users. I propose the following sampling method: For each of the original 291 original Users, sample randomly 10 users they mention and collect their tweets. Note that this may be fewer than (291x10) users, since many users will be mentioned by multiple users.

ElaineResende commented 8 years ago

Ok. :)

ElaineResende commented 8 years ago

We have 321 users who mentioned one of the keywords (cometothestreet, dilmaout, ptout). For each user was collected at most 10 random mentions and the timeline of mentioned users. At the end we have 2794 timelines of mentioned users .

Symmetric mentions is being done and also the graph.

aronwc commented 8 years ago

Great. Since this will take a while to collect, let's randomize the list of 2794 and collect in that order, that way we'll get at least some neighbors for each user.

ElaineResende commented 8 years ago

What do you mean with collect? Get the timeline? If so, I already have the timeline for each one of the 2794 users. Or you meant the graph?

aronwc commented 8 years ago

I meant timeline -- great, you can ignore my comment!

On Thu, Sep 24, 2015 at 1:14 PM, ElaineResende notifications@github.com wrote:

What do you mean with collect? Get the timeline? If so, I already have the timeline for each one of the 2794 users. Or you meant the graph?

— Reply to this email directly or view it on GitHub https://github.com/tapilab/protest/issues/16#issuecomment-143008708.

ElaineResende commented 8 years ago

Ok. I am concerned about the running time for creating the graph and for creation of the feature vector too. We talk about it in our meeting.

ElaineResende commented 8 years ago

Hi Aron,

I zipped all files and it is much less than 1GB. They are all on Dropbox.

https://www.dropbox.com/s/rqln786wvoprstw/all%20timeline.rar?dl=0

Let me know if you can't download it.

Best, Elaine. On Sep 24, 2015 3:37 PM, "aronwc" notifications@github.com wrote:

I meant timeline -- great, you can ignore my comment!

On Thu, Sep 24, 2015 at 1:14 PM, ElaineResende notifications@github.com wrote:

What do you mean with collect? Get the timeline? If so, I already have the timeline for each one of the 2794 users. Or you meant the graph?

— Reply to this email directly or view it on GitHub https://github.com/tapilab/protest/issues/16#issuecomment-143008708.

— Reply to this email directly or view it on GitHub https://github.com/tapilab/protest/issues/16#issuecomment-143014454.

aronwc commented 8 years ago

Got it, thanks!

On Thu, Sep 24, 2015 at 7:09 PM, ElaineResende notifications@github.com wrote:

Hi Aron,

I zipped all files and it is much less than 1GB. They are all on Dropbox.

https://www.dropbox.com/s/rqln786wvoprstw/all%20timeline.rar?dl=0

Let me know if you can't download it.

Best, Elaine. On Sep 24, 2015 3:37 PM, "aronwc" notifications@github.com wrote:

I meant timeline -- great, you can ignore my comment!

On Thu, Sep 24, 2015 at 1:14 PM, ElaineResende <notifications@github.com

wrote:

What do you mean with collect? Get the timeline? If so, I already have the timeline for each one of the 2794 users. Or you meant the graph?

— Reply to this email directly or view it on GitHub https://github.com/tapilab/protest/issues/16#issuecomment-143008708.

— Reply to this email directly or view it on GitHub https://github.com/tapilab/protest/issues/16#issuecomment-143014454.

— Reply to this email directly or view it on GitHub https://github.com/tapilab/protest/issues/16#issuecomment-143086007.

ElaineResende commented 8 years ago

Initially I have the graph below. I generated it with all symmetric mentions. all_mentions_graph

For the next one I made a comparison: if that tweet had one of the keywords I would look if that tweet had mentions of other users.

image

Do you have any other ideas we can do after that?

aronwc commented 8 years ago

We would like to add a feature to our classification task that indicates whether a user's neighbor has used one of the hashtags recently.

This gets a little tricky, because we need to restrict a neighbor's tweets to those posted prior to the user's next tweet. We can then add a feature such as "the percentage of a user's neighbors who have used one of the hashtags prior to time T".

ElaineResende commented 8 years ago

All right. Just to clarify, neighbors are users who have symmetric relationship between them right?

aronwc commented 8 years ago

Yes

On Tue, Sep 29, 2015 at 12:44 PM, ElaineResende notifications@github.com wrote:

All right. Just to clarify, neighbors are users who have symmetric relationship between them right?

— Reply to this email directly or view it on GitHub https://github.com/tapilab/protest/issues/16#issuecomment-144132882.

ElaineResende commented 8 years ago

If I understood correctly I got a dictionary with user1: percentage of neighbors that used any of the keywords before time T (first use of any keyword by user1).

In case you want to check the code running, the needed files are:

Could you install unrar in the server for me to be able to extract the files from .rar? (I hope I can work there now :) )

Thank you.

aronwc commented 8 years ago

Done.

unrar e all\ timeline.rar

On Tue, Sep 29, 2015 at 6:01 PM, ElaineResende notifications@github.com wrote:

If I understood correctly I got a dictionary with user1: percentage of neighbors that used any of the keywords before time T (first use of any keyword by user1).

In case you want to check the code running, the needed files are:

  • all_mentions_graph.pkl is in the server -> home/elaine/Protest/DATA/Timeline
  • all timeline.rar is in the same path as above
  • mentions is in home/elaine/Protest/DATA

Could you install unrar in the server for me to be able to extract the files from .rar? (I hope I can work there now :) )

Thank you.

— Reply to this email directly or view it on GitHub https://github.com/tapilab/protest/issues/16#issuecomment-144215483.

ElaineResende commented 8 years ago

Thank you.

Probably is going to appear some new libraries to install. I am sorry for bothering. Could you please install netwokx?

aronwc commented 8 years ago

OK, done:


[culotta@tapi ~]$ python3

Python 3.4.3 (default, Feb 26 2015, 22:10:05)

[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] on linux

Type "help", "copyright", "credits" or "license" for more information.

>>> import networkx

>>> networkx.__version__

'1.10'

>>>

On Thu, Oct 1, 2015 at 6:52 AM, ElaineResende notifications@github.com wrote:

Thank you.

Probably is going to appear some new libraries to install. I am sorry for bothering. Could you please install netwokx?

— Reply to this email directly or view it on GitHub https://github.com/tapilab/protest/issues/16#issuecomment-144705740.