ookamiinc / twitter-analytics-acquirer

0 stars 0 forks source link

Deal with multiple accounts #5

Closed yusuko closed 6 years ago

yusuko commented 6 years ago

Change script to deal with multiple accounts

yusuko commented 6 years ago

Solve problem to send notification when scraper logs in to twitter.

yusuko commented 6 years ago

Problem

Solution

yusuko commented 6 years ago

ログインの際に読み込まれている情報

2018-09-12 11 29 05

yusuko commented 6 years ago

2018-09-12 11 50 52

yusuko commented 6 years ago

昨日 - 1件 今日 - 53件

ookamiではプロキシサーバーを経由させていない。

yusuko commented 6 years ago

スクレイピングテスト1回目 12:32 2018-09-12 12 32 04

yusuko commented 6 years ago

2018-09-12 12 52 47

 connection refused: localhost:8000 (Net::HTTP::Persistent::Error)

のエラー

yusuko commented 6 years ago

これを使って、特定のパソコンを経由する形にすればいける?? https://ja.softether.org/

yusuko commented 6 years ago

2018-09-12 15 56 25

User_agentは確かに設定したほうが良いが、これだけでは解決に至らず。

yusuko commented 6 years ago

ログイン通知の原因仮説

解決策の仮説

やること

具体的なaction

yusuko commented 6 years ago

Cookieを保存し、毎回それと共にデータをとる。

yusuko commented 6 years ago

cookieで、行けるかは不明。 cookieのやりとりはそこまで難しくないので、stringの保存法を考えるだけ。

ただ、yml形式(string)で手動で渡した結果は下記。 もちろんまだ、検証が必要。

2018-09-12 19 59 27

yusuko commented 6 years ago

参考URL (cookie関連で一番わかりやすい) https://qiita.com/riocampos/items/ae550ccfa1f9e0bf214d

yusuko commented 6 years ago

SQL rubyでのDB作成について 多分一番わかりやすい http://www.gesource.jp/programming/ruby/database/sqlite.html

https://sites.google.com/site/shimesabanote2/linux/ubuntu/sqlite3

yusuko commented 6 years ago

現状 cookieスクレイピングは機能せず。

ログインのところにredirectされる。

yusuko commented 6 years ago

リクエストヘッダの偽装もありらしい ??

yusuko commented 6 years ago

確認すること

[#<HTTP::Cookie:name="_twitter_sess", value="BAh7CiIKZmxhc2hJQzonQWN0aW9uQ29udHJvbGxlcjo6Rmxhc2g6OkZsYXNo%250ASGFzaHsABjoKQHVzZWR7ADoPY3JlYXRlZF9hdGwrCByZo%252BZlAToMY3NyZl9p%250AZCIlZTdmNzcxMzBmNjMxOTVjNzE3YTFhNzQ0NjMzYTBmZGQ6B2lkIiU3Yzcx%250AZDY5NmUwMDJiMmIzZTNjMWVkZDk1ODdlMTI3NToJdXNlcmwrCQaw1UPwCWgO--a396b9dd3810ccee15e3938ba962f9eaa614bda6", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=nil, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:26:47 +0900 origin=https://twitter.com/>, #<HTTP::Cookie:name="personalization_id", value="v1_a++PxKo5XbIH0r228Bxobw==", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2020-09-16 08:26:45 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/>, #<HTTP::Cookie:name="guest_id", value="v1%3A153717280591387503", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2020-09-16 08:26:45 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/>, #<HTTP::Cookie:name="ct0", value="95fafa78ca50709e4f904db7d02b4974", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=false, expires=2018-09-17 14:26:45 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/>, #<HTTP::Cookie:name="dnt", value="1", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:26:47 +0900 origin=https://twitter.com/>, #<HTTP::Cookie:name="ads_prefs", value="HBISAAA=", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, #<HTTP::Cookie:name="kdt", value="IEGKyc3cuFXZq4FH7BxXG81c63SColjPymmCRjZH", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=2020-03-17 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, #<HTTP::Cookie:name="remember_checked_on", value="0", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, #<HTTP::Cookie:name="twid", value="u=1038090641643778054", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=false, expires=nil, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, #<HTTP::Cookie:name="auth_token", value="5803f3becdebbcd76ffbfa57fd9fa3d2e9469c7b", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=nil, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, #<HTTP::Cookie:name="lang", value="ja", domain="twitter.com", for_domain=false, path="/", secure=false, httponly=false, expires=nil, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:26:47 +0900 origin=https://twitter.com/>]
#<Mechanize::CookieJar:0x00007fa57f3f5800 @store=#<HTTP::CookieJar::HashStore:0x00007fa57f40c5f0 @mon_owner=nil, @mon_count=0, @mon_mutex=#<Thread::Mutex:0x00007fa57f40c550>, @logger=nil, @gc_threshold=150, @jar={"twitter.com"=>{"/"=>{"_twitter_sess"=>#<HTTP::Cookie:name="_twitter_sess", value="BAh7CiIKZmxhc2hJQzonQWN0aW9uQ29udHJvbGxlcjo6Rmxhc2g6OkZsYXNo%250ASGFzaHsABjoKQHVzZWR7ADoPY3JlYXRlZF9hdGwrCByZo%252BZlAToMY3NyZl9p%250AZCIlZTdmNzcxMzBmNjMxOTVjNzE3YTFhNzQ0NjMzYTBmZGQ6B2lkIiU3Yzcx%250AZDY5NmUwMDJiMmIzZTNjMWVkZDk1ODdlMTI3NToJdXNlcmwrCQaw1UPwCWgO--a396b9dd3810ccee15e3938ba962f9eaa614bda6", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=nil, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:26:47 +0900 origin=https://twitter.com/>, "personalization_id"=>#<HTTP::Cookie:name="personalization_id", value="v1_a++PxKo5XbIH0r228Bxobw==", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2020-09-16 08:26:45 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/>, "guest_id"=>#<HTTP::Cookie:name="guest_id", value="v1%3A153717280591387503", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2020-09-16 08:26:45 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/>, "ct0"=>#<HTTP::Cookie:name="ct0", value="95fafa78ca50709e4f904db7d02b4974", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=false, expires=2018-09-17 14:26:45 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/>, "dnt"=>#<HTTP::Cookie:name="dnt", value="1", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:26:47 +0900 origin=https://twitter.com/>, "ads_prefs"=>#<HTTP::Cookie:name="ads_prefs", value="HBISAAA=", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, "kdt"=>#<HTTP::Cookie:name="kdt", value="IEGKyc3cuFXZq4FH7BxXG81c63SColjPymmCRjZH", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=2020-03-17 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, "remember_checked_on"=>#<HTTP::Cookie:name="remember_checked_on", value="0", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, "twid"=>#<HTTP::Cookie:name="twid", value="u=1038090641643778054", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=false, expires=nil, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, "auth_token"=>#<HTTP::Cookie:name="auth_token", value="5803f3becdebbcd76ffbfa57fd9fa3d2e9469c7b", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=nil, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, "lang"=>#<HTTP::Cookie:name="lang", value="ja", domain="twitter.com", for_domain=false, path="/", secure=false, httponly=false, expires=nil, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:26:47 +0900 origin=https://twitter.com/>}}}, @gc_index=17>>

ログイン後と、ボディ取得後の違い 2018-09-17 17 17 55

@agent.cookie_jar.load(cookies_io_read)後 (byebug) @agent.cookies [#<HTTP::Cookie:name="personalization_id", value="v1_a++PxKo5XbIH0r228Bxobw==", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2020-09-16 08:26:45 +0000, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=nil>, #<HTTP::Cookie:name="guest_id", value="v1%3A153717280591387503", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2020-09-16 08:26:45 +0000, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=nil>, #<HTTP::Cookie:name="ct0", value="95fafa78ca50709e4f904db7d02b4974", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=false, expires=2018-09-17 14:26:45 +0000, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=nil>, #<HTTP::Cookie:name="dnt", value="1", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 +0000, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:26:47 +0900 origin=nil>, #<HTTP::Cookie:name="ads_prefs", value="HBISAAA=", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 +0000, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=nil>, #<HTTP::Cookie:name="kdt", value="IEGKyc3cuFXZq4FH7BxXG81c63SColjPymmCRjZH", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=2020-03-17 08:26:46 +0000, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=nil>, #<HTTP::Cookie:name="remember_checked_on", value="0", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 +0000, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=nil>]


 - Cookieの確認

その後res.bodyで望みのものは取れず。

 この時に本当にCookieが同じになっているかどうか。

また、そもそもcookieの情報のどの部分を渡せば良いのか?
yusuko commented 6 years ago

{:session => true}を与えたところ、cookieでできた! http://sayac.hateblo.jp/entry/2015/09/23/041410

  1. session,trueも含めて、cookie情報を引き出す。 2.それをyaml化して保存する 3.それをあとでsetしてあげる。

完了!これで通知は来ない。

yusuko commented 6 years ago

To do

yusuko commented 6 years ago

2018-09-17 19 19 36

参考:https://www.rokurofire.info/2013/11/11/ruby_db/

yusuko commented 6 years ago

2018-09-17 19 37 54

require 'mysql2'

client = Mysql2::Client.new(host: "localhost", username: "root", password: '', database: 'twitter_analytics_acquirer')

client.query("insert into twitter_accounts (cookies) values ('クッキー')")
client.query("SELECT * FROM twitter_accounts").each do |e1|
  puts e1
end

上記で動いた。

yusuko commented 6 years ago

消したコード。一応書いておく


client = Mysql2::Client.new(host: "localhost", username: "root", password: '', database: 'twitter_analytics_acquirer')

client.query("insert into twitter_accounts (cookies) values ('クッキー')")
client.query("SELECT * FROM twitter_accounts").each do |e1|
  puts e1
end
yusuko commented 6 years ago

2018-09-17 20 14 31

yusuko commented 6 years ago

@nafu @ladnack I share my working process.

What I intend

  1. First Execution
    • login with password
    • save cookies in DB
  2. Execution from second

    • use cookies saved in DB (not login)

    Fact

  3. We don't get notification mail when using cookies.
    • I took a test on my account.
  4. I successfully acquired data with cookies.

    To do

  5. Add process in case of failure to get data at first login
    • to avoid save wrong cookies in DB
  6. Add process in case of failure to get data with cookies
    • possible cause
      • cookies are wrong or expired
  7. Refactor method

申し訳ないですが、上のメモ書きがかなり汚くなっています。 このコメントと上の2つのcode_reviewが整理されているものです。

yusuko commented 6 years ago

points to be fixed

yusuko commented 6 years ago

DB 2018-09-18 11 34 32

yusuko commented 6 years ago

It is difficult to refactoring method.

if twitter account

I can't avoid writing the code to get data twice...

sakatore commented 6 years ago

I looked over the PR's comments.

yusuko commented 6 years ago

DBを作る際のフロー

が必要となる。これがないと機能しなくなるので、null :falseにする。

yusuko commented 6 years ago

DBを下記に変更 2018-09-18 13 57 25

yusuko commented 6 years ago

I've almost finished. I'll ask review after checking on myself.

yusuko commented 6 years ago

How to use mysql

Command Line

bundle install ・・・ install 'mysql2' and 'activerecord'

sudo mysql ・・・ log in to MySQL monitor

After login to MySQL monitor

CREATE DATABASE twitter_analytics_acquirer; ・・・create DB

USE twitter_analytics_acquirer; ・・・ select DB

CREATE TABLE twitter_analytics_acquirer.twitter_accounts
(id int not null auto_increment primary key,
 cookies text,
 name text not null,
 password text not null,
 worksheet_name text not null,
 created_at datetime  default current_timestamp,
 updated_at timestamp default current_timestamp on update current_timestamp);

show columns from twitter_accounts; ・・・ see table

MySQL setup is finished.

Add data

Notice: You should change PASSWORD into the real password.

INSERT INTO  twitter_analytics_acquirer.twitter_accounts (name, password, worksheet_name)
 VALUES ("Playerapp_vb",PASSWORD,"player");

・・・ Add data

SELECT * FROM twitter_accounts; ・・・ Check data

You can also create data using Active record.

TwitterAccount.create
  (name: YOUR_TWITTER_ID,
   password: YOUR_TWITTER_PASSWORD,
   worksheet_name: YOUR_WORKSHEET_NAME)

Spread sheet

https://docs.google.com/spreadsheets/d/1bt_D2fNj9jlsNgP41GjsfJ2OjPkqDk1EtTRDTv7R1Us/edit#gid=1754749292

yusuko commented 6 years ago

@nafu @ladnack Please check when you have enough time🙇‍♂️

You can read https://github.com/ookamiinc/twitter-analytics-acquirer/pull/5#issuecomment-422336931 if you want to set up mysql as I intended.

yusuko commented 6 years ago

Memo 2018-09-19 11 36 38

yusuko commented 6 years ago

@nafu @ladnack Please check when you have enough time🙇‍♂️

yusuko commented 6 years ago

@ladnack お手すきの時にご確認をお願いします🙇‍♂️

主要変更点。 https://github.com/ookamiinc/twitter-analytics-acquirer/pull/5#discussion_r219071683 https://github.com/ookamiinc/twitter-analytics-acquirer/pull/5#discussion_r219069864

yusuko commented 6 years ago

@nafu Please check🙇‍♂️

yusuko commented 6 years ago

@nafu @ladnack のレビューをいただいたので、masterにマージして明日本番環境にあげようかと思います!localでは機能しているので。

もし、あげる前にふーみさんのレビューが必要ということであれば言ってください🙇‍♂️(なんとなく、かなりお忙しそうなので。。。)

nafu commented 6 years ago

かなりアップデートされたと思います 👏 素晴らしい 👏 👏

nafu commented 6 years ago

進めちゃって下さい 💪

yusuko commented 6 years ago

ありがとうございます! 進めます💪