sotetsuk / goscholar

Google scholar scraper written in Go
MIT License
17 stars 7 forks source link

Learn google scholar prameters #2

Open sotetsuk opened 8 years ago

sotetsuk commented 8 years ago

scalar.pyを参考にパラメータの意味を考える. 20160502時点現在のもの.

   # Default URLs for visiting and submitting Settings pane, as of 3/14
    GET_SETTINGS_URL = ScholarConf.SCHOLAR_SITE + '/scholar_settings?' \
        + 'sciifh=1&hl=en&as_sdt=0,5'

    SET_SETTINGS_URL = ScholarConf.SCHOLAR_SITE + '/scholar_setprefs?' \
        + 'q=' \
        + '&scisig=%(scisig)s' \
        + '&inststart=0' \
        + '&as_sdt=1,5' \
        + '&as_sdtp=' \
        + '&num=%(num)s' \
        + '&scis=%(scis)s' \
        + '%(scisf)s' \
        + '&hl=en&lang=all&instq=&inst=569367360547434339&save='
 SCHOLAR_QUERY_URL = ScholarConf.SCHOLAR_SITE + '/scholar?' \
        + 'as_q=%(words)s' \
        + '&as_epq=%(phrase)s' \
        + '&as_oq=%(words_some)s' \
        + '&as_eq=%(words_none)s' \
        + '&as_occt=%(scope)s' \
        + '&as_sauthors=%(authors)s' \
        + '&as_publication=%(pub)s' \
        + '&as_ylo=%(ylo)s' \
        + '&as_yhi=%(yhi)s' \
        + '&as_sdt=%(patents)s%%2C5' \
        + '&as_vis=%(citations)s' \
        + '&btnG=&hl=en' \
        + '&num=%(num)s'

パラメータリスト

普通に検索クエリを投げる

https://scholar.google.co.jp/scholar?q=deeplearning&btnG=&hl=ja&as_sdt=0%2C5

ページネーションする

https://scholar.google.co.jp/scholar?start=10&q=deeplearning&hl=ja&as_sdt=0,5

マイライブラリを続けて開く

https://scholar.google.co.jp/scholar?scilib=1&scioq=deeplearning&hl=ja&as_sdt=0,5

2016年以降で検索する

https://scholar.google.co.jp/scholar?as_ylo=2016&q=panyanyan&hl=ja&as_sdt=0,5 

2013〜2014で検索する

https://scholar.google.co.jp/scholar?q=pockemon&hl=ja&as_sdt=0%2C5&as_ylo=2003&as_yhi=2014

関連性で並び替えボタンを押す

https://scholar.google.co.jp/scholar?hl=ja&as_sdt=0,5&q=pockemon

日付順で並び替えボタンを押す

https://scholar.google.co.jp/scholar?hl=ja&as_sdt=0,5&q=pockemon&scisbd=1

引用画面に遷移

https://scholar.google.co.jp/scholar?cites=8108748482885444188&as_sdt=2005&sciodt=0,5&hl=ja

引用画面ページネーション後

https://scholar.google.co.jp/scholar?start=10&hl=ja&as_sdt=2005&sciodt=0,5&cites=8108748482885444188&scipsc=

ArticleのFooterみたいなとこのソース

<div class="gs_fl">
  <a href="/scholar?cites=8108748482885444188&amp;as_sdt=2005&amp;sciodt=0,5&amp;hl=en">Cited by 376</a>
  <a href="/scholar?q=related:XOJff8gPiHAJ:scholar.google.com/&amp;hl=en&amp;as_sdt=0,5">Related articles</a>
  <a href="/scholar?cluster=8108748482885444188&amp;hl=en&amp;as_sdt=0,5" class="gs_nph">All 5 versions</a>
  <a href="/scholar.bib?q=info:XOJff8gPiHAJ:scholar.google.com/&amp;output=citation&amp;hl=en&amp;ct=citation&amp;cd=0" class="gs_nta gs_nph">Import into BibTeX</a>
  <a onclick="return gs_ocit(event,'XOJff8gPiHAJ','0')" href="#" class="gs_nvi" role="button" aria-controls="gs_cit" aria-haspopup="true">Cite</a> 
  <span class="gs_nph">
    <a id="gs_svl0" onclick="return gs_sva('XOJff8gPiHAJ','0')" href="#" title="Save this article to my library so that I can read or cite it later.">Save</a>
    <span id="gs_svo0" class="gs_svm">Saving<span id="gs_svd0">...</span>
  </span>
  <a id="gs_svs0" style="display:none">Saved</a>
  <span id="gs_sve0" class="gs_svm">Error saving. <a onclick="return gs_sva('XOJff8gPiHAJ','0')" href="#">Try again?</a>
  </span></span>
  <a href="#" class="gs_mor" role="button" onclick="return gs_more(this,1)">More</a> 
  <a href="#" class="gs_nvi" role="button" onclick="return gs_more(this,0)">Fewer</a>
</div>

TODO

sotetsuk commented 8 years ago

クエリを作る

普通の検索

色々パラメータを指定できる Search Tips

2016-05-02 16 16 06

参考(scholar.py)

-a AUTHORS, --author=AUTHORS            Author name(s)
    -A WORDS, --all=WORDS                   Results must contain all of these words
    -s WORDS, --some=WORDS                  Results must contain at least one of these words. Pass
                                            arguments in form -s "foo bar baz" for simple words, and
                                            -s "a phrase, another phrase" for phrases
    -n WORDS, --none=WORDS                  Results must contain none of these words. See -s|--some
                                            re. formatting
    -p PHRASE, --phrase=PHRASE              Results must contain exact phrase
    -t, --title-only                        Search title only
    -P PUBLICATIONS, --pub=PUBLICATIONS     Results must have appeared in this publication
    --after=YEAR                            Results must have appeared in or after given year
    --before=YEAR                           Results must have appeared in or before given year
    --no-patents                            Do not include patents in results
    --no-citations                          Do not include citations in results
    -C CLUSTER_ID, --cluster-id=CLUSTER_ID  Do not search, just use articles in given cluster ID
    -c COUNT, --count=COUNT                 Maximum number of results

検索できる条件の仕様

--cluster-id
--author
--title
--query
--citing 
--before
--after
--num
--start

query deep leanring

https://scholar.google.co.jp/scholar?hl=ja&q=deep+learning&btnG=&lr=

author deep learning author:"y bengio"

https://scholar.google.co.jp/scholar?hl=ja&q=deep+learning+author%3A%22y+bengio%22&btnG=&lr=

title "deep leanring"

https://scholar.google.co.jp/scholar?hl=ja&q=%22deep+leanring%22&btnG=&lr=

(一件しかヒットせず)

citing (引用元を押しただけ)

https://scholar.google.co.jp/scholar?cites=8108748482885444188&as_sdt=2005&sciodt=0,5&hl=ja

deep learning 2012年以降

https://scholar.google.co.jp/scholar?as_ylo=2012&q=deep+learning&hl=ja&as_sdt=0,5

deep learning 2015-2015

https://scholar.google.co.jp/scholar?q=deep+learning&hl=ja&as_sdt=0%2C5&as_ylo=2012&as_yhi=2015

deep learning -2015

https://scholar.google.co.jp/scholar?q=deep+learning&hl=ja&as_sdt=0%2C5&as_ylo=&as_yhi=2015

引用中での 2012-

https://scholar.google.co.jp/scholar?as_ylo=2012&hl=ja&as_sdt=2005&sciodt=0,5&cites=15932869302045479284&scipsc=

引用中での 2012-2015

https://scholar.google.co.jp/scholar?hl=ja&as_sdt=2005&sciodt=0%2C5&cites=15932869302045479284&scipsc=&as_ylo=2012&as_yhi=2015

引用中での-2015

https://scholar.google.co.jp/scholar?hl=ja&as_sdt=2005&sciodt=0%2C5&cites=15932869302045479284&scipsc=&as_ylo=&as_yhi=2015

deep learningで50番目から

https://scholar.google.co.jp/scholar?start=50&q=deep+learning&hl=ja&as_sdt=0,5

引用中で40番目から

https://scholar.google.co.jp/scholar?start=40&hl=ja&as_sdt=2005&sciodt=0,5&cites=16988628068303769209&scipsc=

BiBTeX取得用

ポップアップ

https://scholar.google.co.jp/scholar?q=info:XOJff8gPiHAJ:scholar.google.com/&output=cite&scirp=0&hl=ja

bibtexへ遷移 (前のページからスクレイピング)

https://scholar.google.co.jp/scholar.bib?q=info:XOJff8gPiHAJ:scholar.google.com/&output=citation&scisig=AAGBfm0AAAAAVycKX7aaMyFAf9KR6kSbIyZN4DxQG-zb&scisf=4&hl=ja
sotetsuk commented 8 years ago

返すモノ仕様

アウトプットのタイプも

sotetsuk commented 8 years ago

コマンドライン引数

go-scholar: scraping google scholar searching results

Usage:
  go-scholar search (--author=<author>|--title=<title>|--query=<query>) [--before=<year>|--after=<year>|--num-articles=<num-articles>|--start=<start>]
  go-scholar find <cluster-id> [--before=<year>|--after=<year>|--num-articles=<num-articles>|--start=<start>]
  go-scholar cite <cites-id> [--before=<year>|--after=<year>|--num-articles=<num-articles>|--start=<start>]
  go-scholar -h | --help
  go-scholar --version
Options:
  --author=<author>
  --title=<title>
  --query=<query>
  --before=<year>
  --after=<year>
  --num-articles=<num-articles> 
  --start=<start>
  -h --help
  --version
sotetsuk commented 8 years ago

12 ではカスタムサーチの存在に気づいた

あとはこのタスクはwikiにあとでうつす