Closed makkammerer closed 3 years ago
This version of the client can do (almost) everything the previous version could do, just with slightly different syntax. It'd be easier to answer your question with an example of what used to work, since I'm not familiar with Pandas, but here's my best shot.
The big difference: results are now instances of the arxiv.Result
class rather than dicts. These need to be converted into a DataFrame-readable structure.
This StackOverflow answer extracts member variables from the object; we can use it on our results:
import arxiv
from pandas import DataFrame
# Search for 10 results.
results = arxiv.Search(query="quantum", max_results=10).get()
# Convert `arxiv.Result` fields into dictionaries.
results_as_dicts = [vars(r) for r in results]
# Construct DataFrame.
df = DataFrame(data=results_as_dicts)
Which yields a reasonable DataFrame (abridged here):
>>> df
entry_id ... links
0 http://arxiv.org/abs/quant-ph/0201082v1 ... [<arxiv.arxiv.Result.Link object at 0x1105f9f1...
1 http://arxiv.org/abs/quant-ph/0407102v1 ... [<arxiv.arxiv.Result.Link object at 0x1105f93d...
2 http://arxiv.org/abs/0804.3401v1 ... [<arxiv.arxiv.Result.Link object at 0x1105f921...
3 http://arxiv.org/abs/1311.4939v1 ... [<arxiv.arxiv.Result.Link object at 0x110569b5...
4 http://arxiv.org/abs/1611.03472v1 ... [<arxiv.arxiv.Result.Link object at 0x110573f9...
5 http://arxiv.org/abs/q-alg/9610034v1 ... [<arxiv.arxiv.Result.Link object at 0x11057315...
6 http://arxiv.org/abs/quant-ph/0302169v1 ... [<arxiv.arxiv.Result.Link object at 0x11057349...
7 http://arxiv.org/abs/quant-ph/0309066v1 ... [<arxiv.arxiv.Result.Link object at 0x11060365...
8 http://arxiv.org/abs/quant-ph/0504224v1 ... [<arxiv.arxiv.Result.Link object at 0x11060379...
9 http://arxiv.org/abs/2006.03757v1 ... [<arxiv.arxiv.Result.Link object at 0x110603a1...
[10 rows x 10 columns]
>>> df.columns
Index(['entry_id', 'updated', 'published', 'title', 'authors', 'summary',
'comment', 'primary_category', 'categories', 'links'],
dtype='object')
If you want more than those 10 fields in the DataFrame, customize the conversion of arxiv.Results
into dicts. For example, to add short IDs:
def convert(result):
row = vars(result)
row['short_id'] = result.get_short_id()
return row
# Search for 10 results.
results = arxiv.Search(query="quantum", max_results=10).get()
# Construct DataFrame using the custom `convert` transform.
df_extra = DataFrame(data=[convert(r) for r in results])
This new DataFrame includes the short IDs:
>>> df_extra
Index(['entry_id', 'updated', 'published', 'title', 'authors', 'summary',
'comment', 'primary_category', 'categories', 'links', 'short_id'],
dtype='object')
Could you give an example of wrapper's usage with pandas dataframe? Previous version could it!