rempsyc / busara_dashboard

The Missing Majority in Behavioural Science Dashboard
https://remi-theriault.com/dashboards/missing_majority
1 stars 0 forks source link

Collabra data #4

Closed psforscher closed 6 months ago

psforscher commented 10 months ago

I was surprised by the Collabra data. The country-level data seems wrong (all squares are "other"), but even the continent data seems inconsistent with my informal analyses (I did a count of the most recent 40 or 50 articles published recently, out of curiosity, and it was less than 50% North America, I think, but maybe I'm misremembering).

More info here: https://docs.google.com/document/d/1e_cf1M1vUAzeyKCq77ClJPa5cDB-XAk60bI0cCKUOzw/edit

rempsyc commented 7 months ago

even the continent data seems inconsistent with my informal analyses (I did a count of the most recent 40 or 50 articles published recently, out of curiosity, and it was less than 50% North America, I think, but maybe I'm misremembering).

  1. For continents, this could be due to the problem with missing data not being random (#18 and #15). The only option for this point would be to go through each missing data and add the universities not included in the current data base with their country.
  2. The informal analysis looks at the last 40 or 50 articles, but the dashboard continent-level data collapses across all years, so one possibility is that the journal pattern has changed over time. This should be verified later.
rempsyc commented 7 months ago

Filtering the master dataset for only the Collabra journal, and investigating the resulting data frame, we see:

> collabra <- articles.df4 |> 
+   filter(journal == "Collabra. Psychology")
> nrow(collabra)
[1] 14
> sort(unique(collabra$year))
[1] "2020" "2021" "2022" "2023"

That there is only 14 papers that are included for some reason. That might be because of difficulties fetching these data from PubMed, but that will require further investigation.

rempsyc commented 7 months ago

The previous hypothesis is confirmed by making a direct PubMed search with keyword "Collabra. Psychology"[Journal]:

https://pubmed.ncbi.nlm.nih.gov/?term=%22Collabra.+Psychology%22%5BJournal%5D&sort=

image

So it seems to be a PubMed-Collabra problem... This is a good example of the limitations of relying on PubMed data, but I'm afraid there is not much we can do in that situation.

rempsyc commented 7 months ago

For the country-level data bug, it was fixed in #19

psforscher commented 7 months ago

Great!

It also might be worth spot-checking some of the journals that appear to have articles from just one continent, such as Review of International Political Economy (see below). To me, this feels like it could be a bug ... maybe not, but worth double-checking.

Screenshot 2024-03-17 at 22 48 33
psforscher commented 7 months ago

I wonder if the country plots have any bugs as well ... for example, the Review of International Political Economy appears to only have articles from the UK -- is that correct, or is it a bug?

Screenshot 2024-03-17 at 22 50 51
rempsyc commented 6 months ago

It also might be worth spot-checking some of the journals that appear to have articles from just one continent, such as Review of International Political Economy

Could be worth spot-checking, yes, but for this particular one, we can see from the count table (#22) that Review of International Political Economy has only 3 papers, that's why it shows only one continent (Europe) as well as one country (UK):

image

So perhaps we can close this issue and continue this investigation in #34?