Closed skytguuu closed 2 years ago
The counts collection returns counts of co-occurences, not paper data like titles. The words collection is used for collecting articles data like titles.
If you want the titles of words with term co-occurences, you can organize the search terms, for example, using the "include" feature to search terms that include particular co-occurences.
Hi,
Thanks for your help. As your advice, I used the collect_words with 'inclusions=term' to collect the titles and it works. But I have another question about the search. Because I have a lot of terms needed to search, so the result was be interrupted by the connection after several terms searching. The error is as followed: ################### ConnectionError: HTTPSConnectionPool(host='eutils.ncbi.nlm.nih.gov', port=443): Max retries exceeded with url: /entrez/eutils/esearch.fcgi?db=pubmed&usehistory=y&retmax=100&field=TIAB&retmode=xml&term=(%22oogenesis%22OR%22+oogenetic%22)AND(%22CIDs00000019%22OR%222,3-dihydroxybenzoic+acid%22) (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001732850F910>: Failed to establish a new connection: [WinError 10060] ####################### The problem seemed the ip was blocked by the NCBI. Is there a way to solve it? I tried to use time.sleep(5), however, it failed again. Could you give a hand?
Thank you so much! Best
Ah, yeh, there could definitely be some robustness things that could be improved, so that's its more robust to missed
Could you let me know approximately how many terms you are searching for, and after how many it tends to fail?
As of right now:
logging=my_requester
, the collection will use this wait time.I am really appreciated for your help. Actually, I have fouty thousands of terms need to be searched. And it failed after 20 terms, sometimes it failed only 3 or 8 terms. I have tried use API key and set the requester time (1s), however it failed in 3 terms again. The followed is my command: ############## my_requester = Requester(wait_time=1) results, meta_data = collect_words(terms=keys, inclusions=term, retmax=100,usehistory=True, save_and_clear=False, verbose=True, logging=my_requester, api_key="72f729883aed8ad3942d8c4fec698a7bff09") ################ It was reported the same error "ConnectionError: ('Connection aborted.', TimeoutError(10060))" . Thanks for your wonderful advise! Best
Hey @skytguuu - oops, sorry, I accidentally dropped off here. Did you have any luck getting this working? Other than what I've suggested, I don't really have any other suggestions for running this, since the issue appears to be the API connection rather than anything specific to LISC.
Hi @TomDonoghue, Thanks for your reply. As you suggested, I have solved this problem by setting the crawler time and adding the sleep time.
Hi,
Sorry to bother. When I used 'collect_counts' to fetch the articles, it successfully showed the number of papers that include the term. However, I want to know the titles of the articles that corresponds to the number. I tried used print(meta_dat), but it had not include the titles. Followed is my code and result: ################## coocs, term_counts, meta_dat = collect_counts( terms_a=terms_a, terms_b=terms_b, db='pubmed', save_and_clear=True,usehistory=True,verbose=True) ##################
Is there a way to extract the articles' titles that corresponds to the number? Thanks for your help! Best