Document content - Githubissues

adamhsparks commented 8 years ago

I've started filling in some content into the outline. I'm aware that I'm probably too R-centric, reproducible research does not revolve around R or scientific computing, however, these tools do make it much easier.

Best Practices

We need some good best practices for research from field level to in silico.

Examples/the State of RR in Plant Pathology

I've started filling in with my own work where I've made everything available. We should attempt to quantify efforts to make research reproducible/replicable in plant pathology somehow.

From our e-mail string:

The systematic/quantitative review of articles will provide a nice basis for identifying what has been done so far in our field. We may be able to see which fields are more “reproducible” than others and what are the trends. Then, we could provide guidelines for the best practices (tools available, format, etc) with examples and case studies. I could work on something related to meta-analysis, for example.

By the way, interesting that meta-analysis (which uses published or unpublished data) has been used in plant pathology during the last 10 years but the data and codes are not being shared as far as I know. An open database that allows others to keep adding data would be very useful.

adamhsparks commented 7 years ago

I've started trying to devise a randomised method of systematically selecting articles for review to see if they are reproducible, https://github.com/adamhsparks/Reproducible-Research-in-Plant-Pathology/tree/master/src

I've not yet tried this methods, I've just been working toward devising it and getting it set up.

Thoughts?

zachary-foster commented 7 years ago

I like that you are using R scripts to make the article selection. Very reproducible! This seems like a good way to go about it to me.

How to handle reviews or letters?

I think we should exclude them from the sample, especially reviews.

emdelponte commented 7 years ago

This is very good Adam. Glad to hear good news and see some progress. I am about to finish the semester next week and will definitely dedicate time for this project.

I liked the way you created to handle the sampling of articles and agree with Zachary that we should skip reviews and other non-full articles. As I understand from now on we should go after the articles to scrutinize and build another table, but the extraction will be manual correct? where are we going to store the articles for the scrutinity?

adamhsparks commented 7 years ago

@zachary-foster, thanks for the feedback. I agree, skip the reviews. However, I'm looking for a specific plan, e.g., "When we encountered a review article, we omitted it and went to the following article in the same journal issue." Thoughts?

@emdelponte, Yes, extraction will be manual. You'll see that I've started making a list of them, I'll tidy it up and keep it in the Rmd/md file.

emdelponte commented 7 years ago

@adamhsparks that is exactly my initial thought on how to skip the non-target articles

Shall we define a common set of variables to extract from each article to populate the table? any thoughts on these variables/questions? e.g. is code presented as supplemental? are raw data available? something like that.

emdelponte commented 7 years ago

OK. I saw the suggestions of categories by @zachary-foster in the other issue opened. I am learning how to work in this environment :)

emdelponte commented 7 years ago

I was glancing over the list of journals ranked by Google and the randomization process. I have different thoughts after seeing the results and our discussion on the article types to target.

Since we are planning to submit this manuscript to a typical/traditional plant pathological journal (Phytopathology) that encopasses all aspects (from molecular to landscape level), we need to perform an expert assessment of the Google's rank or even try different approaches like using some journal metrics systems (more specific than Google's) to rank and compare - just found this new CiteScore metrics by Scopus . In the Google's list, I see some journals that are not a good fit to our study considering the above. Here they are with some justification for exclusion:

Crop protection (other areas)
Nematology (not only plant nematodes)
Phytoparasitica (includes weeds and insects)
Eppo Bulletin (same as Plant Protection)
International Phytoplasmologist Working Group Meeting (not a journal)
Annual Review of Phytopathology (we are not going to evaluate reviews)

I understand that after filtering the top 10 journals, most of these seem to be excluded, but in my opinion we should pick more journals from the initial list (that fit our search criteria) - up to 10 to 15 perhaps and use a different randomization approach that selects a fixed number of articles per journal within the time period. In the current randomization process you will see that Annual review contributed 9 articles and Phytopathology only 2! (any Phytopathology EIC would not be happy with this number :)

By fixing the number of articles per journal (10 or 15 if number of journals are reduced) may allow testing hypotheses related to factors affecting the frequency of reproducibility practices. If this is something to pursue, we could even pick less journals that are representative of journals categories (molecular, applied, fundamental) or even different continents, impact factor category, etc. and increase sample size. We could inspect the instructions for authors of the journals to check (provide a rate?) if they encourage reproducibility practices.

I am not sure about picking up to 10 articles from 500 (average 100 articles per year) published by a journal is a good number? I suggest focusing on the three APS journals and pick another set (7 to 10 more) of equivalent (molecular, fundamental and applied) but published in other countries/continents and with different impact factors. So we could harvest 20 articles per journal? If we split the manual work among us and focus on a set of reproducibility-related variables that is not too large, the manual work may not take long. What you think?

Emerson

adamhsparks commented 7 years ago

@emdelponte, thanks for your thoughts. I had some of those same thoughts as well, but wasn't aware of the Scopus ranking.

I agree with you on this. Let's improve our selection methods to be more representative of the journals of interest. I picked 100 just on the assumption that journals publish an average of 10 articles/issue and trying to find something statistically sound so we could actually do some basic statistics on this to see how many are or are not reproducible and see if there are trends between journals and/or areas.

adamhsparks commented 7 years ago

@emdelponte, @zachary-foster I'm finally getting back around to thinking about this after Christmas. It's been a long break and I'm in the middle of annual reports at the moment, but let's get this thing settled and written by June? Is that feasible? It's been too easy to ignore without any deadlines...

emdelponte commented 7 years ago

@adamhsparks welcome to the long break of Dec-Feb (Summer!). I was on vacation during January and had two big tasks to accomplish in February: submit a paper to the Focus Issue on Epidemiology and organize the Tropical Fusarium Workshop. Now I am ready to begin classes next week, but will definitely work you and @zachary-foster on this. You are right - is quite easy to stick with other things (and they are not a few!). The deadline of June is fine with me. How do you want to proceed from now on? shall we define who does what and suggest a deadline for the task?

PS: It is interesting to see new papers in plant pathology journals putting out reproducible codes. This one is a good example: https://github.com/alejorojas2/Rojas_Survey_Phytopath_2016

zachary-foster commented 7 years ago

@adamhsparks,

but let's get this thing settled and written by June...

June seems reasonable. Once we define a set of variables to quantify reproducibility and a selection of papers, things should be relatively straight forward.

@emdelponte,

fixed number of articles per journal ....

Seems like a good idea if we want to make statements about individual journals.

we could even pick less journals that are representative of journals categories ...

Can we assume the journals we pick are actually representative without looking at other journals in the area? If we take this approach, I think it would be best to not focus on individual journals at all, but associate papers with their categories or other attributes of the paper or the journal it was published in (e.g. impact, publishing requirements, date published, journal category) and not even mention journal names. That way we also avoid any unhappy EICs and there are more relationships we can explore. I am thinking of scatter plots looking at the correlation between reproducibility and various attributes e.g. impact factor. Also, since we would not need a minimum number of papers per journal, we can increase the diversity of papers/journal we sample. It might be interesting to include some low impact journals/papers as well for comparison with the high impact ones. We could widen the date range a bit (basically as early as reproducible computational research was practical), so we could track reproducibility over time.

If we do look at individual journals, somewhere in the article we will have to say (or present data that implies) which are the least reproducible journals, which might not be the most political thing to do. I am fine with doing it, but its something to consider I think.

If we split the manual work among us and focus on a set of reproducibility-related variables that is not too large, the manual work may not take long...

Are you thinking around 200 papers total? I am not sure I understood this paragraph. Are you saying 20 articles x 10 journals? 200 seems doable.

emdelponte commented 7 years ago

@zachary-foster

Journals in the field of plant pathology (or closely-related such as Crop Protection) are not that many, if we want to focus on those that are specific to the field - assuming we want to address this to the plant pathology community. These journals will fall in the categories you mentioned (IF, applied, fundamental, country of authors, etc). Your suggestion of assigning articles to categories and exploring relationships is good. Any route we take, we need to define the list of journals, the number of articles to harvest and show the name of the journal eventually (reference list). I see you meant to not explicitly mention them in the text, which is fine. I will draft this weekend a list of journals and assign them to categories

Exactly, a minimum of 200 articles. Ideas of variables to extract from them? @adamhsparks

adamhsparks commented 7 years ago

@zachary-foster and @emdelponte, I finalised my report yesterday afternoon and am off to the International Temperate Rice Conference early tomorrow. But I'll try to give this some though this week and share ideas.

adamhsparks commented 7 years ago

@zachary-foster, I agree with your assessment about naming individual journals. Let's categorise articles as you suggested.

What categories can we use? Broadly we could say there are:

applied plant pathology, e.g. plot experiments in the field and glass/screenhouse);
computational plant pathology, e.g. modelling, bioinformatics, etc.;
molecular plant pathology, e.g. identification and characterisation (may have crossover with bioinformatics)

Are there others we should consider?

Looking at a longer time period is a nice idea, if we can get enough articles and handle the load, to look at possible trends. Perhaps trends over time in all plant pathology is possible, if we start doing that for any of our categories I'm not sure we can gather enough articles and handle the work?

zachary-foster commented 7 years ago

@adamhsparks, these categories look good. I cant think of any others at the moment, but we might find some as we read papers.

Should we have sub categories?

Should we allow a paper to be classified by multiple categories?

Perhaps trends over time in all plant pathology is possible, if we start doing that for any of our categories I'm not sure we can gather enough articles and handle the work?

Yea, we might not have enough in each category, but the trend overall would be interesting.

adamhsparks commented 7 years ago

@zachary-foster, I'd expect there to be overlap as well. I don't think that's an issue, but we need to have some clear definitions of how we categorise them.

adamhsparks commented 7 years ago

Since we all seem to be in agreement about the categories, can we divide it up?

I'd like to move the discussion over to that issue now, please.

grunwald commented 7 years ago

Hi folks, sorry to join the conversation late. @zachary-foster alerted me to the issues and now I am included. Instead of selecting journals we could also cull the literature using google scholar searching for 'plant pathogen'?

openplantpathology / Reproducibility_in_Plant_Pathology

Document content #2

Best Practices

Examples/the State of RR in Plant Pathology