sappelhoff / bids_eeg_report

Code for producing figures used in the report on the BIDS Extension Proposal 6
https://www.nature.com/articles/s41597-019-0104-8
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

data #1

Closed CPernet closed 6 years ago

CPernet commented 6 years ago

how did you create the 'data'?

sappelhoff commented 6 years ago

The data were extracted from the PubMed website:

image

After searching for "EEG" in Title and abstract. I edited the README file to incorporate this explanation.

CPernet commented 6 years ago

I'm playing with Python (quite new to it) and got it like this

from metapub import PubMedFetcher
fetch = PubMedFetcher()

narticles = []
years= list(range(1930,2018))
for index, year in enumerate(years,1):
    myquery = 'EEG[Title/Abstract]) AND ("%d"[Date - Entrez] : "%d"[Date - Entrez])' %(year,year+1)
    pmids = fetch.pmids_for_query(myquery, retmax=20000)
    narticles.append(len(pmids))
sappelhoff commented 6 years ago

interesting, however - running your code, I get different numbers that do not match up with the data extracted via the PubMed website:

See the data ... where we have 3698 entries for 2017, and then from your code:

In [11]: for y, n in zip(years[::-1], narticles[::-1]):
    ...:     print(y,n)
    ...:     
2017 6192
2016 7299
2015 6861
2014 6364
2013 6264
2012 5732
2011 4685
2010 4292
2009 4173
2008 3847
2007 3891
2006 3694
2005 3202
2004 3099
2003 2977
2002 2901
2001 2810
2000 2728
1999 2589
1998 2473
1997 2207
1996 2103
1995 2189
1994 2165
1993 2136
1992 2144
1991 2191
1990 2155
1989 2174
1988 1978
1987 1715
1986 1724
1985 1675
1984 1644
1983 1582
1982 1405
1981 1342
1980 1328
1979 1360
1978 1366
1977 1274
1976 1272
1975 1276
1974 855
1973 484
1972 494
1971 565
1970 662
1969 802
1968 755
1967 676
1966 591
1965 408
1964 387
1963 333
1962 263
1961 240
1960 225
1959 227
1958 396
1957 364
1956 201
1955 185
1954 158
1953 121
1952 97
1951 88
1950 54
1949 31
1948 12
1947 11
1946 10
1945 3
1944 0
1943 0
1942 0
1941 0
1940 0
1939 0
1938 0
1937 0
1936 0
1935 0
1934 0
1933 0
1932 0
1931 0
1930 0

It's a cool idea to get the data programmatically instead of clicking a button on a web interface. But can you somehow resolve the question why the data don't match up? In case of doubt, I'd trust the web interface more.

CPernet commented 6 years ago

just ran 'myquery' manually in pubmed and got the same results, 3698 was a little while ago :-)

sappelhoff commented 6 years ago

For myquery I use: "EEG[Title/Abstract]" on PubMed. You should get my search results on this link:

https://www.ncbi.nlm.nih.gov/pubmed?term=EEG%5BTitle%2FAbstract%5D

How come that the two of us are getting different data? The one that I get aligns with the data in this repository, whereas your data seems to contain much higher publication counts?

Also: There seems to be a ( missing on this line:

myquery = 'EEG[Title/Abstract])
CPernet commented 6 years ago

because I'm in the UK and we overestimate everything?

sappelhoff commented 6 years ago

@CPernet I'll look into the metapub package during the next days and see whether I can find the issue. Did you code this up from scratch or did you follow some example/tutorial? If so, it'd be nice if you could share the link.

CPernet commented 6 years ago

just coded myself :-)

sappelhoff commented 6 years ago

After working a bit on it, I found the following:

Instead of using a modified query to include years, you can use the since and until parameters of the pmids_for_query method. This seems to get more congruent results. There is still some (negligible) fluctuation on the order of 10 articles per year.

I updated the code to use the automatic scraping of the data anyways. Thanks!

CPernet commented 6 years ago

cool :-)