scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.38k stars 299 forks source link

Store the retrieved data from print(next(search_query)l #29

Closed felippemed closed 4 years ago

felippemed commented 5 years ago

Edited: Sorry if this is a naive question.

The module runs amazingly well. It would be of great help if I could store the results from query into a variable.

print(next(search_query)

Seeing around, I realized that the result regards a "not JSON serializable" object.

Any help for that?

trioputrap commented 5 years ago

It would be of great help if I could store the results from query into a variable.

hello, just assign the result of the query to a variable like this:

authors= scholarly.search_author(name)

and you can iterate the variable like this:

for author in authors:
 #do some stuff here with the author object

Seeing around, I realized that the result regards a "not JSON serializable" object.

For this, I've modified the source code of the Author and Publication class inherited from dictinstead of objectand add dict.__init__ in Author and Publication __init__.

class Publication(dict):
    """Returns an object for a single publication"""
    def __init__(self, __data, pubtype=None):
        dict.__init__(self)
        ....

and to get JSON result, just call:

publication_object.__dict__

or you can use the different approach like here: https://stackoverflow.com/questions/3768895/how-to-make-a-class-json-serializable

Hope it helps you!

felippemed commented 5 years ago

Hello Trioputrap,

I ammended the code as you suggested:

class Publication(dict):
    """Returns an object for a single publication"""
    def __init__(self, __data, pubtype=None):
        self.bib = dict()
        self.source = pubtype
        dict.__init__(self)
....

Then I run the exemplar case

publication=scholarly.search_pubs_query('Perception of physical stability and center of mass of 3D objects')

then it returned fields, correctly

print(next(publication))

###output
{'_filled': False,
 'bib': {'abstract': 'Humans can judge from vision alone whether an object is '
                     'physically stable or not. Such judgments allow observers '
                     'to predict the physical behavior of objects, and hence '
                     'to guide their motor actions. We investigated the visual '
                     'estimation of physical stability of 3-D objects (shown '
                     'in stereoscopically viewed rendered scenes) and how it '
                     'relates to visual estimates of their center of mass '
                     '(COM). In Experiment 1, observers viewed an object near '
                     'the edge of a table and adjusted its tilt to the '
                     'perceived critical angle, ie, the tilt angle at which '
                     'the object …',
         'author': 'SA Cholewiak and RW Fleming and M Singh',
         'eprint': 'https://jov.arvojournals.org/article.aspx?articleID=2213254',
         'title': 'Perception of physical stability and center of mass of 3-D '
                  'objects',
         'url': 'https://jov.arvojournals.org/article.aspx?articleID=2213254'},
 'citedby': 15,
 'id_scholarcitedby': '15736880631888070187',
 'source': 'scholar',
 'url_scholarbib': 'https://scholar.googleusercontent.com/scholar.bib?q=info:K8ZpoI6hZNoJ:scholar.google.com/&output=citation&scisig=AAGBfm0AAAAAXIjCFpwk1u0XEARPUufLltWIPwQg4_P_&scisf=4&ct=citation&cd=0&hl=en'}

However, when it comes to JSON, it stil doesn't work:

json.load(publication)
Traceback (most recent call last):

  File "<ipython-input-43-a51cc3f613b0>", line 1, in <module>
    json.load(publication)

  File "C:\ProgramData\Anaconda3\lib\json\__init__.py", line 293, in load
    return loads(fp.read(),

AttributeError: 'generator' object has no attribute 'read'

I tried the methods you suggested from https://stackoverflow.com/questions/3768895/how-to-make-a-class-json-serializable# but all of them returned

AttributeError: 'generator' object has no attribute '__dict__'

Don't know what else to do.

Thanks for your help

felippemed commented 5 years ago

I can see an object generated in there

scholarly.search_pubs_query(title)
Out[5]: <generator object _search_scholar_soup at 0x000001B5BDD2FE58>

But after the amendments you suggested, I tested again and print(next()) stopped working

print(next(search_query))
Traceback (most recent call last):

  File "<ipython-input-7-ac5f97a46ac4>", line 1, in <module>
    print(next(search_query))

StopIteration
...and JSON still cannot load

json.load(publication_object.__dict__)
Traceback (most recent call last):

  File "<ipython-input-8-a2d96e60c825>", line 1, in <module>
    json.load(publication_object.__dict__)

NameError: name 'json' is not defined

I tried your example step-by-step, but it keeps crashing both for Author and Publication.

json.dumps(cls=publication_object.__dict__)
Traceback (most recent call last):

  File "<ipython-input-18-7f1c80f8d4e3>", line 1, in <module>
    json.dumps(cls=publication_object.__dict__)

NameError: name 'publication_object' is not defined

What have I done wrong?

philshem commented 5 years ago

@felippemed

What have I done wrong?

A couple things that I noticed. First, json.loadwithout the plural loads is for reading from files. But actually json.loads is for reading from strings. In this case, we want json.dumps.

See here for more details.

json.loads take a string as input and returns a dictionary as output. json.dumps take a dictionary as input and returns a string as output.

Here's some code that will write a json file for individual query results. (python==3.7.3, scholarly==0.2.4)

import scholarly
import json
# standard scholarly stuff
search_query = scholarly.search_pubs_query('my search query')
d = next(search_query)

# dump dict as string, load from string to json object
j = json.loads(json.dumps(d.__dict__))

# write json object as file named data.json
with open('data.json', 'w') as outfile:
    json.dump(j, outfile)

If I look at the output of the code, it's a file called data.json that looks like this:

{
  "bib": {
    "title": "\\u201cYour Word is my Command\\u201d: google search by voice: A case study",
    "url": "https://link.springer.com/chapter/10.1007/978-1-4419-5951-5_4",
    "author": "J Schalkwyk and D Beeferman and F Beaufays and B Byrne\\u2026",
    "abstract": "\\u2026 types of up-to-the-minute information (\\u201cwhere's the closest parking spot?\\u201d) or communications\n(eg, \\u201cupdate my facebook status \\u2026 The maturing of powerful search engines provides a very effective\nway to give users what they want if we can recognize the words of their query \\u2026 \n",
    "eprint": "https://ai.google/research/pubs/pub36340.pdf"
  },
  "source": "scholar",
  "citedby": 272,
  "id_scholarcitedby": "12354430935285135518",
  "url_scholarbib": "https://scholar.googleusercontent.com/scholar.bib?q=info:nnjEo3rBc6sJ:scholar.google.com/&output=citation&scisdr=CgXVgDvOGAA:AAGBfm0AAAAAXPTZXd-J2lG3fgUgaqNWc3JsRL9dwl57&scisig=AAGBfm0AAAAAXPTZXTLvO5REmdaJgtI-6e6nEJShubdb&scisf=4&ct=citation&cd=0&hl=en",
  "_filled": false
}