timo-b-roettger / MoBa_Transparency

The associated repository for the MoBa Transparency project
0 stars 0 forks source link

Set up meeting #2 #5

Open timo-b-roettger opened 1 year ago

timo-b-roettger commented 1 year ago

Beginning of May, sent out email to find good time to meet.

timo-b-roettger commented 1 year ago

2nd Meeting end of may: http://whenisgood.net/qzahect

timo-b-roettger commented 1 year ago

Agenda for 2nd Meeting:

  1. Possible sampling strategies: @parekhpravesh, you had some ideas how to approach this issue. Would you be willing to prepare a coarse sampling strategy (how do we find target articles, using which tools, etc.).
  2. How to approach a pilot to check feasibility of feature extraction (what can reliably be extracted and how, and how time consuming is the process).
  3. What practices to assess (see some suggestions below)
  4. What does the prescriptive part looks like (show and tell?)

Notes from emails:

Laurie: I think it would be useful to discuss plans for part 2 (the “prescription” !?), in terms of whether we are seeking to go beyond saying: people using MoBa data do not (generally) preregister or share code, and we think that in future they should. If so - whether it be by offering a worked example, providing tools or workflows or checklists, or something else altogether - I think we should try to get the shape of this early, since it seems unlikely to be particularly contingent on part 1 and could potentially be worked on in parallel.

Adrian:

Pravesh: I think Laurie brought up a good point about “diagnosis” and “prescription” style – this is super helpful and ties in neatly with the idea of “show, don’t tell”. I have seen similar approaches in some other papers (see, for example Box 2 of https://doi.org/10.1016/j.tics.2020.06.009 or the general prescription-style of https://doi.org/10.1016/j.neuroimage.2022.119623 and https://doi.org/10.1016/j.tics.2014.11.008). If, for all issues that we identify, we present remedial solutions and not just general advice, it would certainly help the next set of papers that are written using the MoBa dataset. Additionally, I agree that it would be interesting to examine whether there is evidence for p-value hacking. Examining effect sizes (and generally whether effect sizes + measures of uncertainty are reported) would be lovely.

More generally, we could consider grouping the variables (or features) that we want to examine. For example: Transparency/open-science practices: Was the study pre-registered? Is the code available? Is derived data available? Reporting practices: Are scripting/programming languages mentioned? Are packages used for data analyses mentioned? Are software/package version numbers included? Open-access analytical platform or commercial software? Analytical strategy: Does the study use cross-sectional or longitudinal data? Univariate or multivariate analyses? How many variables examined in the study Good statistical practices: Is sample size mentioned? Are effect sizes reported? Are measures of uncertainty reported? Are there any reliability analyses (or cross-validation or independent replication) in the paper? Are non-parametric statistics used? Was multiple comparison correction performed? Diversity assessment: Gender distribution Ethnicity assessment One could potentially assess gender/ethnicities of the references contained within relevant papers themselves

This list is just an indication but could help structuring the strategy of parsing each paper. Of course, the deeper we go, the more the amount of work required and the more limited the number of automated methods available. It also would end up requiring more effort and commitment from each person.

parekhpravesh commented 1 year ago

Here are some notes on sampling strategy:

1) PubMed search

Search term: (norwegian mother and child cohort study) OR (norwegian mother, father and child cohort study) OR (MoBa): (https://pubmed.ncbi.nlm.nih.gov/?term=%28norwegian+mother+and+child+cohort+study%29+OR+%28norwegian+mother%2C+father+and+child+cohort+study%29+OR+%28MoBa%29) Searched on: 22nd May 2023 at 10:40 AM Oslo time (csv file downloaded) Results: 1018 hits Filter out studies with study year < 2005: 938 studies remaining

2) Use MoBa publications page

Download page source (HTML format) for MoBa publications: https://www.fhi.no/en/studies/moba/for-forskere-artikler/publications/ Parse this web page using attached MATLAB script: parseHTML.txt Results: 1026 hits (note that the webpage states 1027 results)

3) Intersect the 1026 results with the 1018 hits from PubMed: 368 papers out of 1026 which are not in the PubMed list

4) Europe PMC search

Search term: (norwegian mother and child cohort study) OR (norwegian mother, father and child cohort study) OR (MoBa)(https://europepmc.org/search?query=%28norwegian%20mother%20and%20child%20cohort%20study%29%20OR%20%28norwegian%20mother,%20father%20and%20child%20cohort%20study%29%20OR%20%28MoBa%29) Searched on: 22nd May 2023 at 7:30 PM Oslo time (csv file downloaded) Results: 5451 hits Apply the following filters using the attached MATLAB script parseEuropePMC.txt:

Remaining articles: 4694 Removed duplicate titles (10 identified, only able to automatically remove 1) Remove papers that overlap with PubMed search: 901 overlapping papers; remaining: 3792 Remove papers that overlap with PubMed search using DOI: 44 overlapping papers; remaining: 3748

Summary

Additional notes

parekhpravesh commented 1 year ago

As an aside, it would be interesting to also look at publication trends and see what types of journals are articles being published in (for example, from a quick look, I found a large number of PLoS One papers within the Europe PMC article list), open access or not, journal domain, etc.