zhiyzuo / python-scopus

PyScopus
http://zhiyzuo.github.io/python-scopus/
MIT License
23 stars 29 forks source link

Problems with author retrieval #15

Closed wvranken closed 4 years ago

wvranken commented 6 years ago

Thanks for providing this library!

I had some problems with author retrieval, I think when there is only one publication associated and/or a single affiliation. In any case I've added a check for an affiliation information dictionary being passed as a string (line 100-101), and a type check to make sure the pandas dataframe gets a list (line 247-249), see attached file.

utils.py.gz

zhiyzuo commented 6 years ago

Thanks for your effort on this! Can you elaborate what problems you had without using the two newly added if's? Maybe show your example that produces the error will help since I've not had any problems yet.

wvranken commented 6 years ago

Hi Zhiya,

I am downloading reference info using:

main_pi_scopus_id = '6602685472' sdf = scopus.search_author_publication(self.main_pi_scopus_id)

and then got the authors for the pubs from:

for authorScopusId in publication.authors: authorInfo = scopus.retrieve_author(authorScopusId)

This last bit of code crashes on some authors for me, and works with the two if:. It seems like it has something to do with the info returned by the SCOPUS API I think, which is not fully consistent if there is only one affiliation for example.

Best,

Wim

On 7 Sep 2018, at 23:46, Zhiya Zuo notifications@github.com wrote:

Thanks for your effort on this! Can you elaborate what problems you had without using the two newly added if's? Maybe show your example that produces the error will help since I've not had any problems yet.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

zhiyzuo commented 6 years ago

thanks for the clarification. assuming i understand your code correctly, you:

however, authors column in returned publication info data is a list of authors instead of one. therefore, i would write the code as follows to use retrieve_author:

coauthors_of_main_pi = list(set([a for l in publication.authors for a in l]))
for author_id in coauthors_of_main_pi:
    author_info = scopus.retrieve_author(author_id)
    #todo

since the variable names are somewhat confusing (e.g., i do not know what is publication and assume it is just sdf), let me know if i misunderstood your problem.

On Sep 10, 2018, at 6:01 AM, wvranken notifications@github.com wrote:

Hi Zhiya,

I am downloading reference info using:

main_pi_scopus_id = '6602685472' sdf = scopus.search_author_publication(self.main_pi_scopus_id)

and then got the authors for the pubs from:

for authorScopusId in publication.authors: authorInfo = scopus.retrieve_author(authorScopusId)

This last bit of code crashes on some authors for me, and works with the two if:. It seems like it has something to do with the info returned by the SCOPUS API I think, which is not fully consistent if there is only one affiliation for example.

Best,

Wim

On 7 Sep 2018, at 23:46, Zhiya Zuo notifications@github.com wrote:

Thanks for your effort on this! Can you elaborate what problems you had without using the two newly added if's? Maybe show your example that produces the error will help since I've not had any problems yet.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zhiyzuo/python-scopus/issues/15#issuecomment-419874303, or mute the thread https://github.com/notifications/unsubscribe-auth/AHJ1XRjc9E5qN1EbVh2d8ccCYWVNrpKvks5uZkaGgaJpZM4WexUq.

wvranken commented 6 years ago

It’s pseudo-code taken from a full script, but yes the publications are the rows in the sdf data frame. In any case, when calling retrieve_author() with an ID it can fail. Try with 57200697581, didn’t work for me.

On 10 Sep 2018, at 15:34, Zhiya Zuo notifications@github.com wrote:

thanks for the clarification. assuming i understand your code correctly, you:

  • first search for a person’a author id (please do not use scopus id for individual authors because it may be confusing for publication scopus ids)
  • sdfthen is a data frame containing publication information
  • now you want to search for all the co-authors given the papers sdf written by main_pi_scopus_id

however, authors column in returned publication info data is a list of authors instead of one. therefore, i would write the code as follows to use retrieve_author:

coauthors_of_main_pi = list(set([a for l in publication.authors for a in l]))
for author_id in coauthors_of_main_pi:
author_info = scopus.retrieve_author(author_id)
#todo

since the variable names are somewhat confusing (e.g., i do not know what is publication and assume it is just sdf), let me know if i misunderstood your problem.

On Sep 10, 2018, at 6:01 AM, wvranken notifications@github.com wrote:

Hi Zhiya,

I am downloading reference info using:

main_pi_scopus_id = '6602685472' sdf = scopus.search_author_publication(self.main_pi_scopus_id)

and then got the authors for the pubs from:

for authorScopusId in publication.authors: authorInfo = scopus.retrieve_author(authorScopusId)

This last bit of code crashes on some authors for me, and works with the two if:. It seems like it has something to do with the info returned by the SCOPUS API I think, which is not fully consistent if there is only one affiliation for example.

Best,

Wim

On 7 Sep 2018, at 23:46, Zhiya Zuo notifications@github.com wrote:

Thanks for your effort on this! Can you elaborate what problems you had without using the two newly added if's? Maybe show your example that produces the error will help since I've not had any problems yet.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zhiyzuo/python-scopus/issues/15#issuecomment-419874303, or mute the thread https://github.com/notifications/unsubscribe-auth/AHJ1XRjc9E5qN1EbVh2d8ccCYWVNrpKvks5uZkaGgaJpZM4WexUq.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

zhiyzuo commented 6 years ago

I just tested it and it failed. can you do a pull request on this so that it’s easier for me review?

thanks for finding this bug!

On Sep 10, 2018, at 8:47 AM, wvranken notifications@github.com wrote:

It’s pseudo-code taken from a full script, but yes the publications are the rows in the sdf data frame. In any case, when calling retrieve_author() with an ID it can fail. Try with 57200697581, didn’t work for me.

On 10 Sep 2018, at 15:34, Zhiya Zuo notifications@github.com wrote:

thanks for the clarification. assuming i understand your code correctly, you:

  • first search for a person’a author id (please do not use scopus id for individual authors because it may be confusing for publication scopus ids)
  • sdfthen is a data frame containing publication information
  • now you want to search for all the co-authors given the papers sdf written by main_pi_scopus_id

however, authors column in returned publication info data is a list of authors instead of one. therefore, i would write the code as follows to use retrieve_author:

coauthors_of_main_pi = list(set([a for l in publication.authors for a in l]))
for author_id in coauthors_of_main_pi:
author_info = scopus.retrieve_author(author_id)
#todo

since the variable names are somewhat confusing (e.g., i do not know what is publication and assume it is just sdf), let me know if i misunderstood your problem.

On Sep 10, 2018, at 6:01 AM, wvranken notifications@github.com wrote:

Hi Zhiya,

I am downloading reference info using:

main_pi_scopus_id = '6602685472' sdf = scopus.search_author_publication(self.main_pi_scopus_id)

and then got the authors for the pubs from:

for authorScopusId in publication.authors: authorInfo = scopus.retrieve_author(authorScopusId)

This last bit of code crashes on some authors for me, and works with the two if:. It seems like it has something to do with the info returned by the SCOPUS API I think, which is not fully consistent if there is only one affiliation for example.

Best,

Wim

On 7 Sep 2018, at 23:46, Zhiya Zuo notifications@github.com wrote:

Thanks for your effort on this! Can you elaborate what problems you had without using the two newly added if's? Maybe show your example that produces the error will help since I've not had any problems yet.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zhiyzuo/python-scopus/issues/15#issuecomment-419874303, or mute the thread https://github.com/notifications/unsubscribe-auth/AHJ1XRjc9E5qN1EbVh2d8ccCYWVNrpKvks5uZkaGgaJpZM4WexUq.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zhiyzuo/python-scopus/issues/15#issuecomment-419918035, or mute the thread https://github.com/notifications/unsubscribe-auth/AHJ1XdBtSnKYc1o1R3-1jZDKfDITDchcks5uZm2HgaJpZM4WexUq.

zhiyzuo commented 4 years ago

closing leftover issue.

the 57200697581 works well in the latest release.

ardprasad commented 2 years ago

I need your help rather urgently. I am collecting all out faculty publications from Scopus using API key. Though I was able to collect each authors publications, when I tried to get to co-authors_id list, using scopus.retrieve_author. I get the following error. print(scopus.retrieve_author(author_id)) File "/usr/local/lib/python3.8/dist-packages/pyscopus/scopus.py", line 144, in retrieve_author raise ValueError('Author %s not found!' %author_id) ValueError: Author 36635367700 not found!

My Code is (Partial): from pyscopus import Scopus MY_API_KEY = 'xxxxxxxxxxx' scopus = Scopus(MY_API_KEY) author_id='36635367700' print(scopus.retrieve_author(author_id))