petermr / openVirus

aggregation of scholarly publications and extracted knowledge on viruses and epidemics.
The Unlicense
66 stars 17 forks source link

Testing medrxiv python/ferret downloader #51

Open petermr opened 4 years ago

petermr commented 4 years ago

Created a file with search_download_medrxiv.py

Problems running with both python3

pm286macbook:ferret pm286$ python3 search_download_medrxiv.py "n95" n95
Traceback (most recent call last):
  File "search_download_medrxiv.py", line 24, in <module>
    query=urllib.quote(sys.argv[1])
AttributeError: module 'urllib' has no attribute 'quote'

and python2

pm286macbook:ferret pm286$ python search_download_medrxiv.py "n95" n95
Running file medrxiv_search_download.fql downloading files to n95
Traceback (most recent call last):
  File "search_download_medrxiv.py", line 28, in <module>
    cmd = ferret + " --param=url:\\\""+query_url+"\\\"  --param=dir:\\\""+output_folder+"\\\" " + fql_file
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
pm286macbook:ferret pm286$ 
l-hawizy commented 4 years ago

I should have added a note that it is in python2 because I assumed thats the default installation on most machines. I can switch it to python3 if its easier The error occurred because ferret wasn't installed. I've put a more descriptive fail in. You would need to follow these steps to install it on MacOS https://github.com/petermr/openVirus/wiki/Ferret

petermr commented 4 years ago

Thanks, I will be intelligent but casual so I pick up undefined operations. P.

On Sun, May 3, 2020 at 3:19 PM l-hawizy notifications@github.com wrote:

I should have added a note that it is in python2 because I assumed thats the default installation on most machines. I can switch it to python3 if its easier The error occurred because ferret wasn't installed. I've put a more descriptive fail in. You would need to follow these steps to install it on MacOS https://github.com/petermr/openVirus/wiki/Ferret

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-623117148, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSY74D33PTJXWK5GS3LRPV4PTANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 4 years ago

I think I already had ferret installed:

Welcome to Ferret REPL 0.10.2
Please use `exit` or `Ctrl-D` to exit this program.
>  
l-hawizy commented 4 years ago

ah great then all you would need is these two commands:

alias ferret="/your/local/directory/ferret_darwin_x86_64/ferret"
export FERRET=ferret
l-hawizy commented 4 years ago

also pull the latest changes

petermr commented 4 years ago

will try tomorrow when brain is working.

On Sun, May 3, 2020 at 6:28 PM l-hawizy notifications@github.com wrote:

also pull the latest changes

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-623148567, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZNGBWATEZM2CRRX43RPWSSJANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

l-hawizy commented 4 years ago

or the easier way just run the ferret command ferret --param=url:\"https://www.medrxiv.org/search/n95\" --param=dir:\"n95\" medrxiv_search_download.fql

petermr commented 4 years ago

Thanks, Can you set out all the precise steps needed? So I can restart from scratch. Thanks.

I currently have installed FERRET and get:

pm286macbook:~ pm286$ FERRET
Welcome to Ferret REPL 0.10.2
Please use `exit` or `Ctrl-D` to exit this program.
> ^C

On Mon, May 4, 2020 at 8:00 AM l-hawizy notifications@github.com wrote:

or the easier way just run the ferret command ferret --param=url:\"https://www.medrxiv.org/search/n95\" --param=dir:\"n95\" medrxiv_search_download.fql

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-623293301, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS3O5LZ4SBW3GE2KIEDRPZRZNANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

l-hawizy commented 4 years ago

So thats setup correctly if you have the environment variable FERRET run: $FERRET --param=url:\"https://www.medrxiv.org/search/n95\" --param=dir:\"n95\" medrxiv_search_download.fql

and if you have the alias ferret then run ferret --param=url:\"https://www.medrxiv.org/search/n95\" --param=dir:\"n95\" medrxiv_search_download.fql

petermr commented 4 years ago

where is medrxiv_search_download.fql ? Is it on the github.com/petermr/openVirus site?

On Mon, May 4, 2020 at 9:17 AM l-hawizy notifications@github.com wrote:

So thats setup correctly if you have the environment variable FERRET run: $FERRET --param=url:\"https://www.medrxiv.org/search/n95\" --param=dir:\"n95\" medrxiv_search_download.fql

and if you have the alias ferret then run ferret --param=url:\"https://www.medrxiv.org/search/n95\" --param=dir:\"n95\" medrxiv_search_download.fql

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-623324637, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZV4LL2C6CSQ7FS6RDRPZ2YJANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 4 years ago

Please can we have a protocol where:

i.e. we can hand a single URL to a newcomer that gives

Then I'll be happy to act as alpha tester :-)

On Mon, May 4, 2020 at 1:13 PM Peter Murray-Rust < peter.murray.rust@googlemail.com> wrote:

where is medrxiv_search_download.fql ? Is it on the github.com/petermr/openVirus site?

On Mon, May 4, 2020 at 9:17 AM l-hawizy notifications@github.com wrote:

So thats setup correctly if you have the environment variable FERRET run: $FERRET --param=url:\"https://www.medrxiv.org/search/n95\" --param=dir:\"n95\" medrxiv_search_download.fql

and if you have the alias ferret then run ferret --param=url:\"https://www.medrxiv.org/search/n95\" --param=dir:\"n95\" medrxiv_search_download.fql

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-623324637, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZV4LL2C6CSQ7FS6RDRPZ2YJANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 4 years ago

Cannot run ferret or python wrapper.

Running from: https://github.com/petermr/openVirus/tree/ferret/ferret

environment

pm286macbook:ferret pm286$ git branch
* ferret
  master
pm286macbook:ferret pm286$ pwd
/Users/pm286/projects/openVirus/ferret
pm286macbook:ferret pm286$ ls
README.md           get_data_biorxiv.fql        medrxiv_search_download.fql scrape.py           search_biorxiv.py
ferret.log          get_data_springer.fql       redalyc.fql         search.fql          search_download_medrxiv.py
pm286macbook:ferret pm286$ python --version
Python 2.7.16

python wrapper

pm286macbook:ferret pm286$ python search_download_medrxiv.py "n95 masks" testn95
Running file medrxiv_search_download.fql downloading files to testn95
Traceback (most recent call last):
  File "search_download_medrxiv.py", line 33, in <module>
    subprocess.check_output(cmd, shell=True)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 223, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command 'ferret --param=url:\"https://www.medrxiv.org/search/n95%20masks\"  --param=dir:\"testn95\" medrxiv_search_download.fql' returned non-zero exit status 1

running raw ferret

pm286macbook:ferret pm286$ ferret --param=url:"https://www.medrxiv.org/search/n95%252Bmasks" --param=dir:"n95" medrxiv_search_download.fql
https://www.medrxiv.org/search/n95%252Bmasks
invalid character 'h' looking for beginning of value
pm286macbook:ferret pm286$ 
3timeslazy commented 4 years ago

@petermr this should work

ferret --param=url:"\"https://www.medrxiv.org/search/n95%252Bmasks\"" --param=dir:"\"n95\"" medrixv_search_download.fql

I wrapped https://www.medrxiv.org/search/n95%252Bmasks and n95 in quotation marks. Ferret takes parameters without quotes as a numbers.

petermr commented 4 years ago

Thanks where is main.go?

On Tue, May 12, 2020 at 3:26 PM Vladimir Fetisov notifications@github.com wrote:

@petermr https://github.com/petermr this should work

go run main.go --param=url:"\"https://www.medrxiv.org/search/n95%252Bmasks\"" --param=dir:"\"n95\"" medrixv_search_download.fql

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-627379652, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS5EKATXKKVHFHZCIH3RRFMB3ANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 4 years ago

I get


pm286macbook:ferret pm286$ ls

README.md get_data_biorxiv.fql medrxiv_search_download.fql scrape.py
search_biorxiv.py

ferret.log get_data_springer.fql redalyc.fql search.fql
search_download_medrxiv.py

pm286macbook:ferret pm286$ go run main.go --param=url:"\"
https://www.medrxiv.org/search/n95%252Bmasks\"" --param=dir:"\"n95\""
medrixv_search_download.fql

stat main.go: no such file or directory

pm286macbook:ferret pm286$

On Tue, May 12, 2020 at 4:07 PM Peter Murray-Rust < peter.murray.rust@googlemail.com> wrote:

Thanks where is main.go?

On Tue, May 12, 2020 at 3:26 PM Vladimir Fetisov notifications@github.com wrote:

@petermr https://github.com/petermr this should work

go run main.go --param=url:"\"https://www.medrxiv.org/search/n95%252Bmasks\"" --param=dir:"\"n95\"" medrixv_search_download.fql

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-627379652, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS5EKATXKKVHFHZCIH3RRFMB3ANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

3timeslazy commented 4 years ago

I edited my comment. Replace main.go with ferret.

I ran ferret from the source code and forgot to fix the command.

petermr commented 4 years ago

ferret --param=url:"\"https://www.medrxiv.org/search/n95%252Bmasks\""
--param=dir:"\"n95\"" medrxiv_search_download.fql

Failed to execute the query

initialize driver: failed to initialize driver: could not resolve IP for
127.0.0.1: DOCUMENT(baseUrl+"/search/vaccine",{driver:"cdp"}) at 2:13

pm286macbook:ferret pm286$

On Tue, May 12, 2020 at 4:18 PM Vladimir Fetisov notifications@github.com wrote:

I edited my comment. Replace main.go with ferret.

I ran ferret from the source code and forgot to fix the command.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-627411304, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS56HRD2WWCR4TDZKT3RRFSFDANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

3timeslazy commented 4 years ago

To use the cdp driver, you need to run Google Chrome before the ferret. On macOS it looks like:

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

The need to launch Google Chrome in front of the ferret is the main reason for the creation of worker.

petermr commented 4 years ago

Thanks, will try.

On Tue, May 12, 2020 at 7:09 PM Vladimir Fetisov notifications@github.com wrote:

To use the cdp driver, you need to run Google Chrome before the ferret. On macOS it looks like:

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

The need to launch Google Chrome in front of the ferret is the main reason for the creation of worker https://github.com/MontFerret/worker.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-627505326, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZSSV3WA54FOVUURNLRRGGERANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 4 years ago

OK, still not quite there:

pm286macbook:ferret pm286$ /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

This brought up a new window which I left open

Opening in existing browser session.
pm286macbook:ferret pm286$ ferret --param=url:"\"https://www.medrxiv.org/search/n95%252Bmasks\"" --param=dir:"\"n95\"" medrxiv_search_download.fql
Failed to execute the query
initialize driver: failed to initialize driver: could not resolve IP for 127.0.0.1: DOCUMENT(baseUrl+"/search/vaccine",{driver:"cdp"}) at 2:13
pm286macbook:ferret pm286$