Open kcmtest opened 4 years ago
execute.sh
cannot find the download folder which is why you are getting the No such file or directory
error. Please make sure you are running bash execute.sh
is the same level as the download folder.
bash execute.sh is the same level as the download folder. so i should be inside my download folder? you mean i can simply get into my download folder and run it?
bash execute.sh is the same level as the download folder. so i should be inside my download folder? you mean i can simply get into my download folder and run it?
No. You should be outside of the download folder. The current directory should look like this:
data/
download/
mapper/
scripts/
execute.sh
... (other files)
Then run bash execute.sh. The download process should work from there. Just tested the download feature and it works for me.
to make my confusion clear i have to copy each of your folder the way you have made ..then only i can run..i was thinking that i would simply run execute.sh and it will work..
i was thinking that i would simply run execute.sh and it will work..
Right the intended goal here is for execute.sh to handle everything. Did you clone the repository or just download the individual file? Forgive my confusion, but I'm not sure how things are set up on your end.
"Did you clone the repository" I will clone it ...and will update you..thank you for clarifying
"Did you clone the repository" I will clone it ...and will update you..thank you for clarifying
No problem. Please me know if you run into any other issues.
I cloned it its running ..how much would be the download size? I would like to know and once i done it do i have to run every now and then?
I cloned it its running ..how much would be the download size? I would like to know and once i done it do i have to run every now and then?
The file should be about 18+GB. I say 18+ because Pubtator Central updates their server monthly; therefore, your downloaded file should be at least 18GB to be correct.
Thank you for the information once its done I will be back with question to bug you again.
With regards
I will have to read me file properly before i come back to you. I will run the test example present first. The download is finished i think but last couple of hours this is running not sure what it is its not downloading anything I guess but what it is?
the bash script is running like 33 hours as of now is it expanding or what exactly is going on? I would be glad to know. So as im not sure so I haven't terminated the process. I would be glad if you can tell me
the bash script is running like 33 hours as of now is it expanding or what exactly is going on? I would be glad to know. So as im not sure so I haven't terminated the process. I would be glad if you can tell me
The 33 hour process is my pipeline converting pubtator central's annotations into xml format to be processed later. It is a large file that can take up to a few days to fully process. No other solution here but to wait until all the pieces have been completed.
Unfortunately the machine was restarted it seems i have to do it again or it can run from where it was there last?
Unfortunately the machine was restarted it seems i have to do it again or it can run from where it was there last?
The older version of the code required you to start from scratch. The newly updated version allows you to start from anywhere in the pipeline. I highly recommend using the newly upgraded version/read the docs for it. It could make your life easier when restarting the parsers.
30988903it [58:10:58, 147.95it/s]
30988894it [11:25:20, 753.61it/s]
1097it [2:10:37, 7.14s/it]
sys:1: DtypeWarning: Columns (4,10) have mixed types. Specify dtype option on import or set low_memory=False.
1097it [1:44:13, 5.70s/it]
274it [10:05:12, 132.53s/it]
Traceback (most recent call last):
File "scripts/download_full_text.py", line 124, in <module>
download_full_text(args.input, args.document_batch, args.temp_dir)
File "scripts/download_full_text.py", line 58, in download_full_text
response = call_api(query)
File "/home/punit/anaconda3/envs/pubtator/lib/python3.8/site-packages/ratelimit/decorators.py", line 113, in wrapper
return func(*args, **kargs)
File "/home/punit/anaconda3/envs/pubtator/lib/python3.8/site-packages/ratelimit/decorators.py", line 80, in wrapper
return func(*args, **kargs)
File "scripts/download_full_text.py", line 21, in call_api
raise Exception(response.text)
Exception
Is this an error or something else do let me know ...im not sure
This error was generated because Pubtator Central's server sent back an error code. I don't know what caused it, so my suggestion is try rerunning that part of the pipeline and if the error comes again I'll take a look.
". I don't know what caused it, so my suggestion is try rerunning that part of the pipeline and if the error comes again I'll take a look." i simply ran this
bash execute.sh
shall i run this again?
No. Don't do that run this command:
python scripts/download_full_text.py \
--input data/pubtator-pmids-to-pmcids.tsv \
--document_batch 100000 \
--output data/pubtator-central-full-text.xml
If you run bash execute.sh
you will restart everything. Not ideal.
thank you for the immediate help
this i got after running the above code sorry for asking these fundamental doubts ..since I use R almost so Im not sure about te errors
download_full_text.py: error: the following arguments are required: --temp_dir
I did make a new folder its running
python scripts/download_full_text.py --input data/pubtator-pmids-to-pmcids.tsv --document_batch 100000 --output data/pubtator-central-full-text.xml --temp_dir /run/media/punit/data4/tupa/
0it [00:00, ?it/s]
The error i received after running the above
0it [02:10, ?it/s]
Traceback (most recent call last):
File "scripts/download_full_text.py", line 124, in <module>
download_full_text(args.input, args.document_batch, args.temp_dir)
File "scripts/download_full_text.py", line 58, in download_full_text
response = call_api(query)
File "/home/punit/anaconda3/envs/pubtator/lib/python3.8/site-packages/ratelimit/decorators.py", line 113, in wrapper
return func(*args, **kargs)
File "/home/punit/anaconda3/envs/pubtator/lib/python3.8/site-packages/ratelimit/decorators.py", line 80, in wrapper
return func(*args, **kargs)
File "scripts/download_full_text.py", line 21, in call_api
raise Exception(response.text)
Exception: <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Submitted URI too large!</title>
<link rev="made" href="mailto:info@ncbi.nlm.nih.gov" />
<style type="text/css"><!--/*--><![CDATA[/*><!--*/
body { color: #000000; background-color: #FFFFFF; }
a:link { color: #0000CC; }
p, address {margin-left: 3em;}
span {font-size: smaller;}
/*]]>*/--></style>
</head>
<body>
<h1>Submitted URI too large!</h1>
<p>
The length of the requested URL exceeds the capacity limit for
this server. The request cannot be processed.
</p>
<p>
If you think this is a server error, please contact
the <a href="mailto:info@ncbi.nlm.nih.gov">webmaster</a>.
</p>
<h2>Error 414</h2>
<address>
<a href="/">www.ncbi.nlm.nih.gov</a><br />
<span>Apache</span>
</address>
</body>
</html>
Basically the program is sending too many ids to be processed. Change document_batch to be 100 or 1000 and run again. The default parameter is too high for Pubtator Central's api.
"Basically the program is sending too many ids to be processed. Change document_batch to be 100 or 1000 and run again. The default parameter is too high for Pubtator Central's api."
okay i will try small numbers
python scripts/download_full_text.py --input data/pubtator-pmids-to-pmcids.tsv --document_batch 100 --output data/pubtator-central-full-text.xml --temp_dir /run/media/punit/data4/tupa/
38it [1:26:01, 135.83s/it]
Traceback (most recent call last):
File "scripts/download_full_text.py", line 124, in <module>
download_full_text(args.input, args.document_batch, args.temp_dir)
File "scripts/download_full_text.py", line 58, in download_full_text
response = call_api(query)
File "/home/punit/anaconda3/envs/pubtator/lib/python3.8/site-packages/ratelimit/decorators.py", line 113, in wrapper
return func(*args, **kargs)
File "/home/punit/anaconda3/envs/pubtator/lib/python3.8/site-packages/ratelimit/decorators.py", line 80, in wrapper
return func(*args, **kargs)
File "scripts/download_full_text.py", line 21, in call_api
raise Exception(response.text)
Exception
Please do have a look
I did see the folder i do see xml files around 553 mb a total of 38 files
For ease of debugging please upload this file: data/pubtator-pmids-to-pmcids.tsv. I'll need it so I can see whats causing the issue.
For ease of debugging please upload this file: data/pubtator-pmids-to-pmcids.tsv. I'll need it so I can see whats causing the issue.
sorry for the late reply im doing it now..i will share the link since its more than 10mb https://drive.google.com/file/d/1G-6ehkeR_V8IhqiBryCMVe1jGc9GPB8Y/view?usp=sharing
Hello sir ..I would be glad to know what was going wrong on my side ...
Hi @krushnach80 - you have encountered a research project that is in progress but on someone's back burner at the moment. It sounds like you might be better served by directly interacting with the pubtator API or similar if you need faster responses in this case: https://www.ncbi.nlm.nih.gov/research/pubtator/
thank you sir ..i found something which would be easier for me ..https://cran.rstudio.com/web/packages/pubtatordb/vignettes/pubtatordb.html
but i would love to use your tool as well
Can you put a tutorial for its usage I do see the reporsitory but Im getting confused what Im supposed to run the web version of pubtator is straight forward where I have to just put pmids it returns back the result . I would be glad if you can put a tutorial
I ran this
but this exist here "https://github.com/greenelab/pubtator/blob/master/download/bioconcepts2pubtator_offsets.gz.log"
Im not sure what Im doing wrong