Open giuliogcantone opened 3 days ago
fields is not one of the OpenAlex entites that we currently support:
oa_entities()
#> [1] "works" "authors" "institutions" "concepts" "keywords" "funders"
#> [7] "sources" "publishers" "topics"
Could you describe your use-case? For the time being, you can get the contents of that with:
httr::content(httr::GET("https://api.openalex.org/fields/11"))
Some dev notes (do you have thoughts, @trangdata?)
I'm actually a bit confused about how much fields stands on its own as an entity - I don't see it as an entry in the API docs, but it does appear as a property of Topic objects: https://docs.openalex.org/api-entities/topics/topic-object#field
There's a diagram in their classification whitepaper explaining more
After looking into fields a bit more, my impression is that it's not interesting in and of itself, but it may be useful for finding topics.
If that's your usecase, you can use field as a filter on a search for topics:
oa_fetch("topics", field.id = 11)
#> # A tibble: 235 × 16
#> id display_name description keywords ids subfield_id subfield_display_name field_id field_display_name
#> <chr> <chr> <chr> <list> <lis> <chr> <chr> <chr> <chr>
#> 1 https://ope… Evolution a… This clust… <chr> <chr> https://op… Ecology, Evolution, … https:/… Agricultural and …
#> 2 https://ope… Diversity a… This clust… <chr> <chr> https://op… Plant Science https:/… Agricultural and …
#> 3 https://ope… Impact of P… This clust… <chr> <chr> https://op… Ecology, Evolution, … https:/… Agricultural and …
#> 4 https://ope… Physiology … This clust… <chr> <chr> https://op… Plant Science https:/… Agricultural and …
#> 5 https://ope… Animal Nutr… This clust… <chr> <chr> https://op… Animal Science and Z… https:/… Agricultural and …
#> 6 https://ope… Genetic and… This clust… <chr> <chr> https://op… Plant Science https:/… Agricultural and …
#> 7 https://ope… Factors Aff… This clust… <chr> <chr> https://op… Animal Science and Z… https:/… Agricultural and …
#> 8 https://ope… Vascular Fl… This clust… <chr> <chr> https://op… Plant Science https:/… Agricultural and …
#> 9 https://ope… Metabolism … This clust… <chr> <chr> https://op… Aquatic Science https:/… Agricultural and …
#> 10 https://ope… Viral RNA S… This clust… <chr> <chr> https://op… Plant Science https:/… Agricultural and …
#> # ℹ 225 more rows
#> # ℹ 7 more variables: domain_id <chr>, domain_display_name <chr>, siblings <list>, works_count <int>,
#> # cited_by_count <int>, updated_date <chr>, created_date <chr>
"domain", "fields", "subfields" are the hierarchical levels of topics. Currently OpenAlexr only fetches the last level of topics (arguably the less interesting). Fields are not an entity different than "topic", however, their structure is different, so yet they may require to be coded as different entities.
Probably you can easily code them into topics just leaving columns with NAs.
We'd be happy to consider adding support for these higher-levels, but I also want to take this as an opportunity to know more about the usecase. Could you let me know what you might plan to do with information in the fields object, once you have it in a data frame?
We'd be happy to consider adding support for these higher-levels, but I also want to take this as an opportunity to know more about the usecase. Could you let me know what you might plan to do with information in the fields object, once you have it in a data frame?
The field coincides with the 26 disciplinary areas of Scopus which is remarkable by itself since it maps 2 different sources. In addition, high-level topics connect better works with careers. Authors are more qualified by "He is a dentist" than "He has published on remedies against caries"; so in general working with authors (instead of works) one wants to fetch information at high level, since often low levels are uninformative or vague in the evaluation of authors.
In addition, high-level topics connect better works with careers. Authors are more qualified by "He is a dentist" than "He has published on remedies against caries"; so in general working with authors (instead of works) one wants to fetch information at high level, since often low levels are uninformative or vague in the evaluation of authors.
I appreciate this, and I'm trying to translate research questions like that ("what career/field does author X work in?") into a workflow in code.
The crucial question is whether such workflows require {openalexR}
to be able to directly query a higher-level object
And here's my hesitation on that. For example, if I query an author, it comes with Topic information attached:
x <- oa_random("authors")
x$topics
#> [[1]]
#> # A tibble: 60 × 5
#> i count id display_name type
#> <int> <int> <chr> <chr> <chr>
#> 1 1 4 https://openalex.org/T10263 Eating Disorders and Body Image Concerns topic
#> 2 1 4 https://openalex.org/subfields/3203 Clinical Psychology subfi…
#> 3 1 4 https://openalex.org/fields/32 Psychology field
#> 4 1 4 https://openalex.org/domains/2 Social Sciences domain
#> 5 2 3 https://openalex.org/T11123 Obsessive-Compulsive Disorder and Related Conditions topic
#> 6 2 3 https://openalex.org/subfields/3203 Clinical Psychology subfi…
#> 7 2 3 https://openalex.org/fields/32 Psychology field
#> 8 2 3 https://openalex.org/domains/2 Social Sciences domain
#> 9 3 1 https://openalex.org/T10853 Cognitive Mechanisms of Anxiety and Depression topic
#> 10 3 1 https://openalex.org/subfields/3205 Experimental and Cognitive Psychology subfi…
#> # ℹ 50 more rows
As you can see, {openalexR}
(specifically, topics2df()
) already breaks down topics such that their higher-level categorization also becomes available.
So at a quick glance, I can say something like "this author is a researcher in the Social Sciences who works in Psychology, specifically Clinical Psychology, studying various Disorders, especially Eating Disorders":
library(tidyverse)
x$topics[[1]] %>%
count(type, display_name, wt = count, name = "total_count", sort = TRUE) %>%
split(~ type)
#> $domain
#> # A tibble: 3 × 3
#> type display_name total_count
#> <chr> <chr> <int>
#> 1 domain Social Sciences 14
#> 2 domain Health Sciences 5
#> 3 domain Life Sciences 1
#>
#> $field
#> # A tibble: 6 × 3
#> type display_name total_count
#> <chr> <chr> <int>
#> 1 field Psychology 12
#> 2 field Medicine 4
#> 3 field Business, Management and Accounting 1
#> 4 field Neuroscience 1
#> 5 field Nursing 1
#> 6 field Social Sciences 1
#>
#> $subfield
#> # A tibble: 10 × 3
#> type display_name total_count
#> <chr> <chr> <int>
#> 1 subfield Clinical Psychology 10
#> 2 subfield Psychiatry and Mental health 2
#> 3 subfield Applied Psychology 1
#> 4 subfield Cognitive Neuroscience 1
#> 5 subfield Experimental and Cognitive Psychology 1
#> 6 subfield Marketing 1
#> 7 subfield Nutrition and Dietetics 1
#> 8 subfield Pharmacology 1
#> 9 subfield Public Health, Environmental and Occupational Health 1
#> 10 subfield Sociology and Political Science 1
#>
#> $topic
#> # A tibble: 15 × 3
#> type display_name total_count
#> <chr> <chr> <int>
#> 1 topic Eating Disorders and Body Image Concerns 4
#> 2 topic Obsessive-Compulsive Disorder and Related Conditions 3
#> 3 topic Borderline Personality Disorder: Psychopathology and Treatment 1
#> 4 topic Cognitive Mechanisms of Anxiety and Depression 1
#> 5 topic Epidemiology and Management of Sexual Dysfunction 1
#> 6 topic Global Trends in Obesity and Overweight Research 1
#> 7 topic Impact of Nutrition and Eating Habits on Health 1
#> 8 topic Impact of Social Media on Well-being and Behavior 1
#> 9 topic Influence of Appearance Management Behavior in Consumer Choices 1
#> 10 topic Interoception and Somatic Symptoms 1
#> 11 topic Molecular Mechanisms of Depression Treatment Strategies 1
#> 12 topic Neurobiological Mechanisms of Placebo and Nocebo Effects 1
#> 13 topic Pathological Gambling and Comorbid Disorders 1
#> 14 topic Psychological Effects of Perfectionism 1
#> 15 topic Theories of Behavior Change and Self-Regulation 1
If the analysis is serious about mapping an author's topics/subfields/fields/domains/etc., you can write functions that consume this data in various ways, e.g., a function to graph out an author's research areas. And as far as I can tell, this workflow doesn't require querying fields/subfields/domains directly, as you can get to those info via the topics object (even if topics itself isn't interesting). So I think I'm still looking for a good, solid usecase for your feature request - am I missing anything here?
oa_fetch( entity = "topic", id = "https://openalex.org/fields/11" )
gives the lexical error.