petermr / pygetpapers

a Python version of getpapers
Apache License 2.0
78 stars 9 forks source link

`pygetpapers` missing flags in `getpapers` #9

Closed petermr closed 3 years ago

petermr commented 3 years ago

Users of getpapers will expect 16 flags to be present in pygetpapers.

Flags should NOT be used for different purposes.

petermr commented 3 years ago
Shweata will reformat this as a table. (snh: done) Current usage of getpapers flags Flags Function in pygetpapers
-h, --help output usage information Identical operation.
-V, --version output the version number Missing in pygetpapers
-q, --query <query> search query (required) identical, except for requiring quotes
-o, --outdir <path> output directory (required - will be created if not found) Behaviour may be different in pygetpapers is updating the project.
--api <name> API to search [eupmc, crossref, ieee, arxiv] (default: eupmc) pygetpapers has the wrong syntax
-x, --xml download fulltext XMLs if available Missing in pygetpapers. Please add
-p, --pdf download fulltext PDFs if available pygetpapers requires --api. This should be removed.
-s, --supp download supplementary files if available Missing in pygetpapers
-t, --minedterms download text-mined terms if available Missing in pygetpapers
-l, --loglevel <level> amount of information to log (silent, verbose, info*, data, warn, error, or debug) Identical in pygetpapers
-a, --all search all papers, not just open access Missing in pygetpapers
-n, --noexecute report how many results match the query, but don't actually download anything Missing in pygetpapers
-f, --logfile <filename> save log to specified file in output directory as well as printing to terminal Identical flag
-k, --limit <int> limit the number of hits and downloads Identical flag. pygetpapers adds default value
--filter <filter object> filter by key value pair, passed straight to the crossref api only Missing in pygetpapers
-r, --restart restart file downloads after failure Missing in pygetpapers
petermr commented 3 years ago

New flags in pygetpapers:

  -v, --onlyquery       Saves pickle file containing the result of the query
                        in storage. The pickle file can be given to
                        --frompickle to download the papers later.

--frompickle FROMPICKLE Reads the picke and makes the xml files. Takes the path to the pickle as the input

-f LOGFILE, --logfile LOGFILE save log to specified file in output directory as well as printing to terminal

  -j, --makejson        Stores the per-document metadata as json. Works only
                        with --api method.

Change to --json`

  -c, --makecsv         Stores the per-document metadata as csv. Works only
                        with --api method.
  -l LOGLEVEL, --loglevel LOGLEVEL
                        Provide logging level. Example --log warning
                        <<info,warning,debug,error,critical>>', default='info'

Please use the same levels as getpapers in the same order

  -u UPDATE, --update UPDATE
                        Updates the corpus by downloading new papers. Requires
                        -k or --limit (If not provided, default will be used)
                        and -q or --query (must be provided) to be given.
                        Takes the path to the pickle as the input.
ayush4921 commented 3 years ago

In pygetpapers verion 0.0.2, The following flags have been replaced or have not been added: -t, --minedterms --> Replaced with --citations and --references -a, --all --> Non open access papers dont have download links so this option was not added --filter --> The current pygetpapers supports only europepmc so --filter was not added --api --> As the current pygetpapers supports only europepmc so --api was not added

petermr commented 3 years ago

Thanks,

I think we should keep the old options but make them inactive . People will have used them .

Restore -a

--api defaults to EPMC (NOT YET implemented)

So something at the bottom like

-t, --minedterms NOT YET implemented --filter NOT YET implemented etc...

Comments below:

On Mon, Mar 22, 2021 at 2:51 AM Ayush Garg @.***> wrote:

In pygetpapers verion 0.0.2, The following flags have been replaced or have not been added: -t, --minedterms --> Replaced with --citations and --references

I think -t is low priority

-a, --all --> Non open access papers dont have download links so this option was not added

I don't think you are correct. Run getpapers with and without -a and you will see you get more papers. This is very important

--filter --> The current pygetpapers supports only europepmc so --filter was not added

it will be important in the future

--api --> As the current pygetpapers supports only europepmc so --api was not added

so will this. In getpapers it is present, but defaults to EPMC

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/pygetpapers/issues/9#issuecomment-803725975, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS63IABGQQKLAEZGCY3TE2WDRANCNFSM4ZMFSACA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK