Closed InfiniteSynthesis closed 3 years ago
Sorry but I need to see the complete error message. There are many places where this error could appera.
On Tue, May 22, 2018 at 10:13 AM, InfiniteSynthesis < notifications@github.com> wrote:
Hello, when I use the tool pubCrawl2 to crawl files with thousands of pmids in pubmed, it often break down with the error: ssl.sslError: read operation timed out
how can I solve it?? thanks
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/pubMunch/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TZNuOWtvXBOz9JUkmw5XplHf5DqQks5t1EdNgaJpZM4UJDtJ .
The error message was like this:
Traceback (most recent call last):
File "./pubCrawl2", line 193, in
what's more, I have to download many pmids. So I cut the list of pmids into pieces, and I opened several terminals to execute it at the same time. Does this affects or is there any other solution to accelerate the download?
This happens when it's trying to contact crossref to find the DOI of the article. It looks like crossref doesn't always reply at the moment. You could simply put a try: / except: around line 42 in pubCrossRef (this line: "jsonStr = httpResp.read()") and repeat the request if you get an "ssl.SSLError". Can you do that?
As for downlaoading many PMIDs, you can reduce the waiting time in pubCrawl2 (there is an option for it, I think it's -t). But be careful, as the publishers may block you at some point. May I ask what you're ultimately trying to do?
I add the try:/ except at the place you mentioned and now it seems working well. Thank you very much!
And for what I am trying to do... I am an undergraduate student, and I read your article "AMELIE accelerates Mendelian patient diagnosis directly from the primary literature" by chance. I feel quite intereted in the text mining and the whole system of medical articles(though I still not very clear about it now..) So I tried to work under the methods behind the aritcle. Now I have downloaded the titles and abstracts in pubmed, and the classifier using omim and unomim articles have been constructed. I find about one million articles and I am downloading the full text of them.
I feel quite cheerful since you helped me solve the question that troubles me for a long time. =v=
Hey, that's great to hear, awesome that you got it to run!
could you tell me exactly which change you made? If you feel adventurous, you can even send me a pull request? https://help.github.com/articles/creating-a-pull-request/
On Thu, May 24, 2018 at 10:24 AM, InfiniteSynthesis < notifications@github.com> wrote:
I add the try:/ except at the place you mentioned and now it seems working well. Thank you very much!
And for what I am trying to do... I am an undergraduate student, and I read your article "AMELIE accelerates Mendelian patient diagnosis directly from the primary literature" by chance. I feel quite intereted in the text mining and the whole system of medical articles(though I still not very clear about it now..) So I tried to work under the methods behind the aritcle. Now I have downloaded the titles and abstracts in pubmed, and the classifier using omim and unomim articles have been constructed. I find about one million articles and I am downloading the full text of them.
I feel quite cheerful since you helped me solve the question that troubles me for a long time. =v=
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maximilianh/pubMunch/issues/12#issuecomment-391794643, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TaaB1jUde84SHrnA5RJY0q9m_Rbzks5t1uzSgaJpZM4UJDtJ .
I have sent you the pull request. Now I have 12 terminals to execute the pubCrawl2 at the same time. Before I make this change, all the terminals will break down after one night. But now they are still running well.
However, I am not familiar with the json module (exactly I am not familiar with python as well) So there may be some mistakes.
emmm, it took me a long time to test it. Sorry to reply so late.
This looks great, I've merged it. Let me know how your crawl goes.
On Sun, May 27, 2018 at 8:59 AM, InfiniteSynthesis <notifications@github.com
wrote:
I have sent you the pull request. Now I have 12 terminals to execute the pubCrawl2 at the same time. Before I make this change, all the terminals will break down after one night. And now they are still running well.
However, I am not familiar with the json module (exactly I am not familiar with python as well) So there may be some mistakes.
emmm, it took me a long time to test it. Sorry to reply so late.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maximilianh/pubMunch/issues/12#issuecomment-392342719, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TUpxk2mSZR9eM3oCZl_kjTztFpK_ks5t2s1wgaJpZM4UJDtJ .
Hello, when I use the tool pubCrawl2 to crawl files with thousands of pmids in pubmed, it often break down with the error: ssl.sslError: read operation timed out
how can I solve it?? thanks