Open petermr opened 4 years ago
This is organized as a picocli
commandline (as is almost all AMI). My current style is to develop new functionalities as Tests, based on commandline and then add this to the JAR. Here's the first test:
public void testBiorxivSmall() throws Exception {
File target = new File("target/biorxiv1");
FileUtils.deleteDirectory(target);
MatcherAssert.assertThat(target+" does not exist", !target.exists());
String args =
"-p " + target
+ " --site biorxiv" // the type of site
+ " --query coronavirus" // the query
+ " --pagesize 1" // size of remote pages (may not always work)
+ " --pages 1 1" // number of pages
+ " --resultset raw clean"
+ " --landingpage "
+ " --fulltext html pdf"
// + " --limit 500" // total number of downloaded results
;
new AMIDownloadTool().runCommands(args);
This should translate to (where
ami-download -p <target> --site biorxiv --query coronavirus --pagesize 25 --pages 1 1 \
--resultset raw clean --landingpage --fulltext html pdf --limit 500
Please try this. And try some of the others.
NOTE: some of the test files may be in my local directory and need transferring to src/test/resource/
. This was to save space in the JAR and repo.
PMR has already written a scraper but it's not optimal and needs cleaning.
More later