hds-lab / textvisdrg

Prototype for exploratory visual data analysis of large social message datasets
MIT License
9 stars 5 forks source link

Cleaning the research questions. #67

Open michaelbrooks opened 9 years ago

michaelbrooks commented 9 years ago

Add problems you notice with the research questions here.

michaelbrooks commented 9 years ago

In "For the major accounts, how many tweets, retweets and @rep were received?" what is @rep?

In "Are the more active users also the emotionally persistent ones?" why is emotionally tagged with sender_mention_count?

nanchenchen commented 9 years ago

The second one is a bug that has been fixed in #68

michaelbrooks commented 9 years ago

Here are two more. This is how they were rendered:

  1. To what extent do social, economic, and cultural factors mediateimpede 4informal communication of 567users from different nations?
  2. To what extent does distance determine the informal communication of 456users from different nations?
Kaminari84 commented 9 years ago

Hey I selected the following dimensions for X(red): Time, Y(blue): Topics and got the following RQs:

  1. "What is the process of learning like in the informal learning space of twitter, and what can be learned about it?"
  2. "How much any of these themes (identified by the authors) managed to capture the #ausvotes community's attention during each day?"
  3. "Were there prevalent key election themes in the Twitter discussion during election period?"
  4. How many retweets and replies were there during the course of the campaign?

1, 4 look weird so I looked them up in the Excel:

  1. What is the 12345\process of learning like\ in the informal learning space of twitter, and what can be learned about it? (Topic; Time; Msg_Replies; Msg_Retweets; Msg_Mentions) OK, now I see it, it is both "Topic" and "Time", that's why it is purple. It does look weird I admit. My fault.
  2. How many 1\retweets\ and 2\replies\ were there 3\during the course of the campaign\? (Msg_Retweets; Msg_Replies; Time) - Basically the last phrase should be red and other 2 should not be colored at all.
Kaminari84 commented 9 years ago

For "For the major accounts, how many tweets, retweets and @rep were received?" it should indeed be relies: "For the major accounts, how many tweets, retweets and @replies were received?"

Kaminari84 commented 9 years ago

These two, you mentioned Michael, these are one of the problems/choices in coding:

  1. To what extent do 123\social, economic, and cultural factors\ mediate/impede 4\informal communication\ of 567\users from different nations\? (Age; Gender; Language; Topic; Timezone; Location; Language) - Language is used twice!
  2. To what extent does 12\distance\ determine the 3\informal communication\ of 456\users from different nations\? (Location; Timezone; Topic; Location; Timezone; Language) - Location and Timezone are used twice!
Kaminari84 commented 9 years ago

I found my typo for a question, it was: "What was the percentage of 1\retweets\ and 2\replies\ during the 345Australian general election\?" It should be: "What was the percentage of 1\retweets\ and 2\replies\ during the 345\Australian general election\?"

I fixed it in spreadsheet

Kaminari84 commented 9 years ago

When I selected X(red): time, Y(blue): Urls, I got: "Is tweet content correlated with an emergency?" and the spreadsheet codes it as: "Is 12345\tweet content\ correlated with an 67\emergency\? (Sentiment; Urls; Media; Topic; Keywords; Hashtags; Time)" So if I see it correctly is should be coloured the other way around.

michaelbrooks commented 9 years ago

If you're making corrections can you make them in the json file instead?

That's this file: https://github.com/hds-lab/textvisdrg/blob/master/setup/research_questions.json

That can easily be imported into the database but the spreadsheet requires additional manual processing and a perl script I think.

Kaminari84 commented 9 years ago

OK

michaelbrooks commented 9 years ago

Do certain categories or topics attract more 4567opinions in Twitter?

nanchenchen commented 9 years ago

To what extent do social, economic, and cultural factors mediateimpede 4informal communication of 567users from different nations? (Garcia-Gavilanes et al. 2014)

michaelbrooks commented 9 years ago

What are the topic frequencies in this Twitter dataset? (didn't have highlighting even though it showed up in the results for hashtags)