Parse tweets and store all edges into an edge table

osome-iu / hoaxy-backend

Backend component for Hoaxy, a tool to visualize the spread of claims and fact checking

http://hoaxy.iuni.iu.edu/

GNU General Public License v3.0

139 stars 44 forks source link

Parse tweets and store all edges into an edge table #4

Closed glciampaglia closed 6 years ago

glciampaglia commented 6 years ago

To speed up network generation queries, we will create an edge table and will add rows to it by parsing each tweet and extracting all replies, mentions, retweets, and quotes.

shaochengcheng commented 6 years ago

The code is developed, now is under testing on local computer. However, we need to have a discussion on the details.

screen_name. As we know that the screen name is not unique for a user, I have set up a super table to store the user records.
edge for two quoting cases.
- retweet a quote. Suppose that A post a tweet, T1 with URL1, B does the comment, T2 with URL2 (quoting it), and C retweet comment of B, T3, how the edges should be built for T3? Currently, we treat all information in T3 should be from B to C, and the edges are B->C with URL1, and B->C with URL2.
  - FIL SAYS: A->B with URL1, B->C with URL2 (IFF can be done efficiently)
- reply with a quote. Suppose that A post a tweet T1, B post a tweet T2 with URL2, and C does a reply to A, T3 with URL3, as well as a tweet status link of T2, how the edges should be built for T2? Currently, we treat all information in T3 should be from C to A, and the edges are C->A with URL2, C->A with URL3.
  - FIL SAYS: A reply in my mind would have edges in both directions A<->C. But in this case T1 does not have a URL, so we ignore edge A->C. The reply has URL3, so we have C->A with URL3. In addition T3 quotes T2, so I would have an edge B->C with URL2. Finally, A will also see the quoted tweet T2, so I would also add edge C->A with URL2 -- as you say. (IFF can be done efficiently)

Thanks Chengcheng

filmenczer commented 6 years ago

I added my interpretations. But if they cannot be implemented easily and efficiently, I am happy to give up on some edges!

glciampaglia commented 6 years ago

It looks like that is_mention and tweet_type have been swapped and this breaks Hoaxy (both the demo site and probably the old Hoaxy). @shaochengcheng can you please take a look at it?

This is the output of "Test Endpoint" from mashape:

      "canonical_url": "http://www.snopes.com/clinton-secret-earpiece-debate/",
      "date_published": "2016-09-27T22:37:20.351Z",
      "domain": "snopes.com",
      "from_user_id": 743076627605684200,
      "from_user_screen_name": "caliwaterman",
      "id": 68363,
      "is_mention": "retweet",
      "site_type": "fact_checking",
      "title": "FALSE: Hillary Clinton Wore Secret Earpiece During First Presidential Debate",
      "to_user_id": 724756382663082000,
      "to_user_screen_name": "LesFoster6",
      "tweet_created_at": "2016-09-28T03:52:41.000Z",
      "tweet_id": "780978510076620804",
      "tweet_type": false,
      "url_id": 1418261

And this is the screeshot from the old Hoaxy, as you can see the console is giving and error and the modal dialog is empty, while instead it should have entries:

shaochengcheng commented 6 years ago

Sorry that I made such a dummy error!. I fixed this bug.

Thanks Chengcheng

filmenczer commented 6 years ago

Does the new network API mean that this issue is closed, or is this still work in progress? I would expect a large speed up by indexing edges. In the OSoMe network tool the speed up was amazing, at least x10.

filmenczer commented 6 years ago

@shaochengcheng -- Giovanni and I looked at your new API function db_query_network using the new network table twitter_network_edge and noticed that it was kind of slow because of two sequential scans. We added an index for field group_id on table article and the index for field article_id on table url. After this, the query uses index scans only, and is super fast!!!!

Please be sure to update the code that creates these tables to include these two indexes. Then you can close this issue.

shaochengcheng commented 6 years ago

Great! Will update the code!

Thanks Chengcheng