Closed yusuko closed 6 years ago
Solve problem to send notification when scraper logs in to twitter.
Problem
なぜ起きるか
ログイン履歴を確認
ログイン端末の許可はどのようにして行っているのか
プロキシにするとどうなるのか
Solution
プロキシを設定する
そもそも都度ログインが必要なのかどうか
アプリ連携の形で許可する??(APIを使って)
ログインの際に読み込まれている情報
昨日 - 1件 今日 - 53件
ookamiではプロキシサーバーを経由させていない。
スクレイピングテスト1回目 12:32
connection refused: localhost:8000 (Net::HTTP::Persistent::Error)
のエラー
これを使って、特定のパソコンを経由する形にすればいける?? https://ja.softether.org/
User_agentは確かに設定したほうが良いが、これだけでは解決に至らず。
ログイン通知の原因仮説
解決策の仮説
やること
具体的なaction
Cookieを保存し、毎回それと共にデータをとる。
cookieで、行けるかは不明。 cookieのやりとりはそこまで難しくないので、stringの保存法を考えるだけ。
ただ、yml形式(string)で手動で渡した結果は下記。 もちろんまだ、検証が必要。
参考URL (cookie関連で一番わかりやすい) https://qiita.com/riocampos/items/ae550ccfa1f9e0bf214d
現状 cookieスクレイピングは機能せず。
ログインのところにredirectされる。
リクエストヘッダの偽装もありらしい ??
確認すること
@agent.cookies
[#<HTTP::Cookie:name="_twitter_sess", value="BAh7CiIKZmxhc2hJQzonQWN0aW9uQ29udHJvbGxlcjo6Rmxhc2g6OkZsYXNo%250ASGFzaHsABjoKQHVzZWR7ADoPY3JlYXRlZF9hdGwrCByZo%252BZlAToMY3NyZl9p%250AZCIlZTdmNzcxMzBmNjMxOTVjNzE3YTFhNzQ0NjMzYTBmZGQ6B2lkIiU3Yzcx%250AZDY5NmUwMDJiMmIzZTNjMWVkZDk1ODdlMTI3NToJdXNlcmwrCQaw1UPwCWgO--a396b9dd3810ccee15e3938ba962f9eaa614bda6", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=nil, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:26:47 +0900 origin=https://twitter.com/>, #<HTTP::Cookie:name="personalization_id", value="v1_a++PxKo5XbIH0r228Bxobw==", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2020-09-16 08:26:45 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/>, #<HTTP::Cookie:name="guest_id", value="v1%3A153717280591387503", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2020-09-16 08:26:45 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/>, #<HTTP::Cookie:name="ct0", value="95fafa78ca50709e4f904db7d02b4974", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=false, expires=2018-09-17 14:26:45 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/>, #<HTTP::Cookie:name="dnt", value="1", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:26:47 +0900 origin=https://twitter.com/>, #<HTTP::Cookie:name="ads_prefs", value="HBISAAA=", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, #<HTTP::Cookie:name="kdt", value="IEGKyc3cuFXZq4FH7BxXG81c63SColjPymmCRjZH", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=2020-03-17 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, #<HTTP::Cookie:name="remember_checked_on", value="0", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, #<HTTP::Cookie:name="twid", value="u=1038090641643778054", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=false, expires=nil, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, #<HTTP::Cookie:name="auth_token", value="5803f3becdebbcd76ffbfa57fd9fa3d2e9469c7b", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=nil, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, #<HTTP::Cookie:name="lang", value="ja", domain="twitter.com", for_domain=false, path="/", secure=false, httponly=false, expires=nil, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:26:47 +0900 origin=https://twitter.com/>]
#<Mechanize::CookieJar:0x00007fa57f3f5800 @store=#<HTTP::CookieJar::HashStore:0x00007fa57f40c5f0 @mon_owner=nil, @mon_count=0, @mon_mutex=#<Thread::Mutex:0x00007fa57f40c550>, @logger=nil, @gc_threshold=150, @jar={"twitter.com"=>{"/"=>{"_twitter_sess"=>#<HTTP::Cookie:name="_twitter_sess", value="BAh7CiIKZmxhc2hJQzonQWN0aW9uQ29udHJvbGxlcjo6Rmxhc2g6OkZsYXNo%250ASGFzaHsABjoKQHVzZWR7ADoPY3JlYXRlZF9hdGwrCByZo%252BZlAToMY3NyZl9p%250AZCIlZTdmNzcxMzBmNjMxOTVjNzE3YTFhNzQ0NjMzYTBmZGQ6B2lkIiU3Yzcx%250AZDY5NmUwMDJiMmIzZTNjMWVkZDk1ODdlMTI3NToJdXNlcmwrCQaw1UPwCWgO--a396b9dd3810ccee15e3938ba962f9eaa614bda6", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=nil, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:26:47 +0900 origin=https://twitter.com/>, "personalization_id"=>#<HTTP::Cookie:name="personalization_id", value="v1_a++PxKo5XbIH0r228Bxobw==", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2020-09-16 08:26:45 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/>, "guest_id"=>#<HTTP::Cookie:name="guest_id", value="v1%3A153717280591387503", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2020-09-16 08:26:45 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/>, "ct0"=>#<HTTP::Cookie:name="ct0", value="95fafa78ca50709e4f904db7d02b4974", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=false, expires=2018-09-17 14:26:45 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/>, "dnt"=>#<HTTP::Cookie:name="dnt", value="1", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:26:47 +0900 origin=https://twitter.com/>, "ads_prefs"=>#<HTTP::Cookie:name="ads_prefs", value="HBISAAA=", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, "kdt"=>#<HTTP::Cookie:name="kdt", value="IEGKyc3cuFXZq4FH7BxXG81c63SColjPymmCRjZH", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=2020-03-17 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, "remember_checked_on"=>#<HTTP::Cookie:name="remember_checked_on", value="0", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, "twid"=>#<HTTP::Cookie:name="twid", value="u=1038090641643778054", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=false, expires=nil, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, "auth_token"=>#<HTTP::Cookie:name="auth_token", value="5803f3becdebbcd76ffbfa57fd9fa3d2e9469c7b", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=nil, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=https://twitter.com/sessions>, "lang"=>#<HTTP::Cookie:name="lang", value="ja", domain="twitter.com", for_domain=false, path="/", secure=false, httponly=false, expires=nil, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:26:47 +0900 origin=https://twitter.com/>}}}, @gc_index=17>>
[#<HTTP::Cookie:name="_twitter_sess", value="BAh7CiIKZmxhc2hJQzonQWN0aW9uQ29udHJvbGxlcjo6Rmxhc2g6OkZsYXNo%250ASGFzaHsABjoKQHVzZWR7ADoPY3JlYXRlZF9hdGwrCByZo%252BZlAToMY3NyZl9p%250AZCIlZTdmNzcxMzBmNjMxOTVjNzE3YTFhNzQ0NjMzYTBmZGQ6B2lkIiU3Yzcx%250AZDY5NmUwMDJiMmIzZTNjMWVkZDk1ODdlMTI3NToJdXNlcmwrCQaw1UPwCWgO--a396b9dd3810ccee15e3938ba962f9eaa614bda6", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=nil, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:28:40 +0900 origin=https://twitter.com/>, #<HTTP::Cookie:name="personalization_id", value="v1_a++PxKo5XbIH0r228Bxobw==", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2020-09-16 08:26:45 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:28:40 +0900 origin=https://twitter.com/>, #<HTTP::Cookie:name="guest_id", value="v1%3A153717280591387503", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2020-09-16 08:26:45 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:28:40 +0900 origin=https://twitter.com/>, #<HTTP::Cookie:name="ct0", value="95fafa78ca50709e4f904db7d02b4974", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=false, expires=2018-09-17 14:26:45 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:28:40 +0900 origin=https://twitter.com/>, #<HTTP::Cookie:name="dnt", value="1", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:28:40 +0900 origin=https://twitter.com/>, #<HTTP::Cookie:name="ads_prefs", value="HBISAAA=", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:28:40 +0900 origin=https://twitter.com/sessions>, #<HTTP::Cookie:name="kdt", value="IEGKyc3cuFXZq4FH7BxXG81c63SColjPymmCRjZH", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=2020-03-17 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:28:40 +0900 origin=https://twitter.com/sessions>, #<HTTP::Cookie:name="remember_checked_on", value="0", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 UTC, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:28:40 +0900 origin=https://twitter.com/sessions>, #<HTTP::Cookie:name="twid", value="u=1038090641643778054", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=false, expires=nil, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:28:40 +0900 origin=https://twitter.com/sessions>, #<HTTP::Cookie:name="auth_token", value="5803f3becdebbcd76ffbfa57fd9fa3d2e9469c7b", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=nil, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:28:40 +0900 origin=https://twitter.com/sessions>, #<HTTP::Cookie:name="lang", value="ja", domain="twitter.com", for_domain=false, path="/", secure=false, httponly=false, expires=nil, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:26:47 +0900 origin=https://twitter.com/>]
ログイン後と、ボディ取得後の違い
Cookieのyaml化したものを確認する
(byebug) cookies_io_write.string
"---\ntwitter.com:\n \"/\":\n personalization_id: !ruby/object:Mechanize::Cookie\n name: personalization_id\n value: v1_a++PxKo5XbIH0r228Bxobw==\n domain: twitter.com\n for_domain: true\n path: \"/\"\n secure: false\n httponly: false\n expires: Wed, 16 Sep 2020 08:26:45 GMT\n max_age: \n created_at: 2018-09-17 17:26:46.117425000 +09:00\n accessed_at: &1 2018-09-17 17:26:46.389438000 +09:00\n guest_id: !ruby/object:Mechanize::Cookie\n name: guest_id\n value: v1%3A153717280591387503\n domain: twitter.com\n for_domain: true\n path: \"/\"\n secure: false\n httponly: false\n expires: Wed, 16 Sep 2020 08:26:45 GMT\n max_age: \n created_at: 2018-09-17 17:26:46.117599000 +09:00\n accessed_at: *1\n ct0: !ruby/object:Mechanize::Cookie\n name: ct0\n value: 95fafa78ca50709e4f904db7d02b4974\n domain: twitter.com\n for_domain: true\n path: \"/\"\n secure: true\n httponly: false\n expires: Mon, 17 Sep 2018 14:26:45 GMT\n max_age: \n created_at: 2018-09-17 17:26:46.117715000 +09:00\n accessed_at: *1\n dnt: !ruby/object:Mechanize::Cookie\n name: dnt\n value: '1'\n domain: twitter.com\n for_domain: true\n path: \"/\"\n secure: false\n httponly: false\n expires: Thu, 14 Sep 2028 08:26:46 GMT\n max_age: \n created_at: &2 2018-09-17 17:26:47.812170000 +09:00\n accessed_at: *2\n ads_prefs: !ruby/object:Mechanize::Cookie\n name: ads_prefs\n value: HBISAAA=\n domain: twitter.com\n for_domain: true\n path: \"/\"\n secure: false\n httponly: false\n expires: Thu, 14 Sep 2028 08:26:46 GMT\n max_age: \n created_at: 2018-09-17 17:26:46.388014000 +09:00\n accessed_at: *1\n kdt: !ruby/object:Mechanize::Cookie\n name: kdt\n value: IEGKyc3cuFXZq4FH7BxXG81c63SColjPymmCRjZH\n domain: twitter.com\n for_domain: true\n path: \"/\"\n secure: true\n httponly: true\n expires: Tue, 17 Mar 2020 08:26:46 GMT\n max_age: \n created_at: 2018-09-17 17:26:46.388254000 +09:00\n accessed_at: *1\n remember_checked_on: !ruby/object:Mechanize::Cookie\n name: remember_checked_on\n value: '0'\n domain: twitter.com\n for_domain: true\n path: \"/\"\n secure: false\n httponly: false\n expires: Thu, 14 Sep 2028 08:26:46 GMT\n max_age: \n created_at: 2018-09-17 17:26:46.388484000 +09:00\n accessed_at: *1\n"
CookieをSet(ログインさずにset_cookiesする)
@agent.cookie_jar.clear後 (byebug) @agent.cookies []
@agent.cookie_jar.load(cookies_io_read)後 (byebug) @agent.cookies [#<HTTP::Cookie:name="personalization_id", value="v1_a++PxKo5XbIH0r228Bxobw==", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2020-09-16 08:26:45 +0000, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=nil>, #<HTTP::Cookie:name="guest_id", value="v1%3A153717280591387503", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2020-09-16 08:26:45 +0000, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=nil>, #<HTTP::Cookie:name="ct0", value="95fafa78ca50709e4f904db7d02b4974", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=false, expires=2018-09-17 14:26:45 +0000, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=nil>, #<HTTP::Cookie:name="dnt", value="1", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 +0000, max_age=nil, created_at=2018-09-17 17:26:47 +0900, accessed_at=2018-09-17 17:26:47 +0900 origin=nil>, #<HTTP::Cookie:name="ads_prefs", value="HBISAAA=", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 +0000, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=nil>, #<HTTP::Cookie:name="kdt", value="IEGKyc3cuFXZq4FH7BxXG81c63SColjPymmCRjZH", domain="twitter.com", for_domain=true, path="/", secure=true, httponly=true, expires=2020-03-17 08:26:46 +0000, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=nil>, #<HTTP::Cookie:name="remember_checked_on", value="0", domain="twitter.com", for_domain=true, path="/", secure=false, httponly=false, expires=2028-09-14 08:26:46 +0000, max_age=nil, created_at=2018-09-17 17:26:46 +0900, accessed_at=2018-09-17 17:26:46 +0900 origin=nil>]
- Cookieの確認
その後res.bodyで望みのものは取れず。
この時に本当にCookieが同じになっているかどうか。
また、そもそもcookieの情報のどの部分を渡せば良いのか?
{:session => true}を与えたところ、cookieでできた! http://sayac.hateblo.jp/entry/2015/09/23/041410
完了!これで通知は来ない。
To do
require 'mysql2'
client = Mysql2::Client.new(host: "localhost", username: "root", password: '', database: 'twitter_analytics_acquirer')
client.query("insert into twitter_accounts (cookies) values ('クッキー')")
client.query("SELECT * FROM twitter_accounts").each do |e1|
puts e1
end
上記で動いた。
消したコード。一応書いておく
client = Mysql2::Client.new(host: "localhost", username: "root", password: '', database: 'twitter_analytics_acquirer')
client.query("insert into twitter_accounts (cookies) values ('クッキー')")
client.query("SELECT * FROM twitter_accounts").each do |e1|
puts e1
end
@nafu @ladnack I share my working process.
Execution from second
I successfully acquired data with cookies.
申し訳ないですが、上のメモ書きがかなり汚くなっています。 このコメントと上の2つのcode_reviewが整理されているものです。
points to be fixed
DB
It is difficult to refactoring method.
if twitter account
I can't avoid writing the code to get data twice...
I looked over the PR's comments.
DBを作る際のフロー
が必要となる。これがないと機能しなくなるので、null :falseにする。
DBを下記に変更
I've almost finished. I'll ask review after checking on myself.
bundle install
・・・ install 'mysql2' and 'activerecord'
sudo mysql
・・・ log in to MySQL monitor
CREATE DATABASE twitter_analytics_acquirer;
・・・create DB
USE twitter_analytics_acquirer;
・・・ select DB
CREATE TABLE twitter_analytics_acquirer.twitter_accounts
(id int not null auto_increment primary key,
cookies text,
name text not null,
password text not null,
worksheet_name text not null,
created_at datetime default current_timestamp,
updated_at timestamp default current_timestamp on update current_timestamp);
show columns from twitter_accounts;
・・・ see table
MySQL setup is finished.
PASSWORD
into the real password.INSERT INTO twitter_analytics_acquirer.twitter_accounts (name, password, worksheet_name)
VALUES ("Playerapp_vb",PASSWORD,"player");
・・・ Add data
SELECT * FROM twitter_accounts;
・・・ Check data
You can also create data using Active record.
TwitterAccount.create
(name: YOUR_TWITTER_ID,
password: YOUR_TWITTER_PASSWORD,
worksheet_name: YOUR_WORKSHEET_NAME)
@nafu @ladnack Please check when you have enough time🙇♂️
You can read https://github.com/ookamiinc/twitter-analytics-acquirer/pull/5#issuecomment-422336931 if you want to set up mysql as I intended.
Memo
@nafu @ladnack Please check when you have enough time🙇♂️
@nafu Please check🙇♂️
@nafu @ladnack のレビューをいただいたので、masterにマージして明日本番環境にあげようかと思います!localでは機能しているので。
もし、あげる前にふーみさんのレビューが必要ということであれば言ってください🙇♂️(なんとなく、かなりお忙しそうなので。。。)
かなりアップデートされたと思います 👏 素晴らしい 👏 👏
進めちゃって下さい 💪
ありがとうございます! 進めます💪
Change script to deal with multiple accounts