sosuke-k / twitter-responding-machine

This is short text conversation project by RNN Encoder-Decoder with tweets.
MIT License
1 stars 1 forks source link

重複チェック #6

Closed sosuke-k closed 8 years ago

sosuke-k commented 8 years ago

index of lines is 50861
^Cexit status 2
sosuke-k commented 8 years ago
mysql> select item_id, count(*) from tweets group by item_id having count(*) > 1;
+---------+----------+
| item_id | count(*) |
+---------+----------+
|         |     5264 |
+---------+----------+
1 row in set (0.11 sec)

mysql> select success, count(*) from tweets group by success;
+---------+----------+
| success | count(*) |
+---------+----------+
|       0 |     5264 |
|       1 |   222430 |
+---------+----------+
2 rows in set (0.11 sec)
sosuke-k commented 8 years ago
load data local infile '/Users/katososuke/go/src/github.com/sosuke-k/twitter-responding-machine/data/twitter_id_str_data.txt' into table stc_tweet_ids;

--local-infile=1 をつけて mysql にログインする。 なんかエラー出た場合はここら辺を参考

sosuke-k commented 8 years ago
select ids.post_id into outfile 'failed_ids.tsv' from stc_tweet_ids as ids where ids.post_id not in ( select t.item_id from tweets as t );