Open petermr opened 4 years ago
When run on comandline it gives:
pm286macbook:ami3 pm286$ ami-download -p target/biorxiv --site biorxiv --query coronavirus --pagesize 1 --pages 1 1 --fulltext pdf html --resultset raw clean
Generic values (AMIDownloadTool)
================================
-v to see generic values
oldstyle true
0 [main] INFO org.contentmine.ami.tools.AMIDownloadTool - set output to: scraped/
0 [main] INFO org.contentmine.ami.tools.AMIDownloadTool - set output to: scraped/
project target/biorxiv
Specific values (AMIDownloadTool)
================================
fulltext [pdf, html]
limit 2
metadata metadata
pages [1, 1]
pagesize 1
query [coronavirus]
resultSetList [raw, clean]
site biorxiv
Query: coronavirus%20sort%3Arelevance-rank%20numresults%3A1
URL https://www.biorxiv.org/search/coronavirus%20sort%3Arelevance-rank%20numresults%3A1
runing curl :https://www.biorxiv.org/search/coronavirus%20sort%3Arelevance-rank%20numresults%3A1 to target/biorxiv/__metadata/resultSet1.html
wrote resultSet: /Users/pm286/workspace/cmdev/ami3/target/biorxiv/__metadata/resultSet1.clean.html
getAuthors(); NYI
metadataEntries 1
Results 1
[target/biorxiv/__metadata/resultSet1.clean.html, target/biorxiv/__metadata/resultSet10.clean.html, target/biorxiv/__metadata/resultSet2.clean.html, target/biorxiv/__metadata/resultSet3.clean.html, target/biorxiv/__metadata/resultSet4.clean.html, target/biorxiv/__metadata/resultSet5.clean.html, target/biorxiv/__metadata/resultSet6.clean.html, target/biorxiv/__metadata/resultSet7.clean.html, target/biorxiv/__metadata/resultSet8.clean.html, target/biorxiv/__metadata/resultSet9.clean.html]
download files in resultSet target/biorxiv/__metadata/resultSet1.clean.html
result set: target/biorxiv/__metadata/resultSet1.clean.html
getAuthors(); NYI
metadataEntries 1
download with curl to <tree>scrapedMetadata.html[/content/10.1101/2020.01.30.926477v1]
running batched up curlDownloader for 1 landingPages, takes ca 1-5 sec/page
ran curlDownloader for 1 landingPages
downloaded 1 files
download files in resultSet target/biorxiv/__metadata/resultSet10.clean.html
result set: target/biorxiv/__metadata/resultSet10.clean.html
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
getAuthors(); NYI
metadataEntries 40
download with curl to <tree>scrapedMetadata.html[/content/10.1101/581512v2, /content/10.1101/856518v1, /content/10.1101/2020.01.24.919282v1, /content/10.1101/732255v3, /content/10.1101/800300v1, /content/10.1101/2020.02.06.936302v3, /content/10.1101/840090v2, /content/10.1101/353037v1, /content/10.1101/2020.01.12.902452v1, /content/10.1101/606715v1, /content/10.1101/695510v1, /content/10.1101/2020.01.10.901801v1, /content/10.1101/599043v1, /content/10.1101/094623v1, /content/10.1101/271171v2, /content/10.1101/2020.03.09.984393v1, /content/10.1101/2020.03.07.982207v1, /content/10.1101/780841v1, /content/10.1101/326546v1, /content/10.1101/676155v1, /content/10.1101/2019.12.18.880849v1, /content/10.1101/777847v2, /content/10.1101/2019.12.20.885590v1, /content/10.1101/2020.02.26.966143v1, /content/10.1101/402800v1, /content/10.1101/2019.12.16.875872v2, /content/10.1101/2020.02.16.946699v1, /content/10.1101/498998v1, /content/10.1101/2020.02.10.942847v1, /content/10.1101/623819v1, /content/10.1101/485060v1, /content/10.1101/476341v1, /content/10.1101/2020.04.02.020081v1, /content/10.1101/548909v1, /content/10.1101/2020.03.25.007534v1, /content/10.1101/2020.01.09.900555v1, /content/10.1101/812313v1, /content/10.1101/804716v1, /content/10.1101/2019.12.21.885921v2, /content/10.1101/296996v1]
running batched up curlDownloader for 40 landingPages, takes ca 1-5 sec/page
It ignores the page restrictions and starts downloading everything.
This should download 1 page of length 1. When run in Eclipse this gives:
and terminates
The output: