chendaniely commented 9 years ago

First pass of general outcomes we want to assess from our learners.

We can prune the list later, but this will be the basis for specific questions we will be asking in the post-workshop surveys

3 categories:

attitudinal changes
skills
declaratory knowledge

chendaniely commented 9 years ago

From the Campbell and Nehm paper: A Critical Analysis of Assessment Quality in Genomics and Bioinformatics Education Research [1]

important to emphasize that, in terms of education testing, "reliability" and "validity" do not refer to the properties of a test, but rather the inferences derived from the scores that the test produces... Tests themselves do not carry the properties of reliability or validity wherever and whenever they are used; rather, the contexts of test use, coupled with inferences about how the test functions to produce scores, are central aspects of validity and reliability.

A foundational source for established perspectives on validity and reliability is Educational Measurement.

Validity

Content
- Does the assessment appropriately represent the specified knowledge domain?
- Delphi Study; textbook analysis; expert survey; Rasch analysis
Substantive
- Are the thinking process thought to be used to answer the items the ones that were actually used?
- "Think aloud" interviews during problem solving; cognitive task analysis
Internal structure
- Do the items capture one dimension or construct?
- Factor analysis; Rasch analysis
External structure
- Does the construct represented in the assessment align with expected external patterns of association (convergent and/or discrimination)?
- Correlation coefficients
Generalization
- Are the scores derived from an assessment meaningful across populations and learning contexts?
- Analyses of performance across a diversity of contexts (e.g., ethnicity, socioeconomic status, etc.); differential item functioning
Consequence
- In what ways might the scores derived from the assessment lead to positive or negative consequences?
- Studying the types of social consequences produced as a result of using test scores (e.g., passing a class graduating from a program).

Reliability

Stability
- How consistent are scores from one administration of the assessment to another
- Stability coefficient
Alternate forms
- Are scores comparable when using similiar items to assess the same construct?
- Spearman-Brown double length formula: split half
Internal consistency
- To what extent do the items on an assessment correlate with one another?
Reliability of raters
- Is the assessment scored consistently by different raters?
- Cohen's or Fleiss's kappa

[1] http://www.ncbi.nlm.nih.gov/pubmed/24006400

chendaniely commented 9 years ago

To come up with an assessment, we would first have to develop questions for each of the categories we want to assess. We should follow the order of topics under validity first, that is mass brainstorm questions, debate on whether they seem to ask everything we want. We can run a series of pilots to make sure they are asking what we want to ask, and slowly test external validity and generalization.

chendaniely commented 9 years ago

pinging @chakwong87 since he will be helping us with the analysis and assessment creation

uiuc-cse commented 9 years ago

Aargh! So many confounding variables.

What if we offered a small incentive—stickers or something—for completion of a six-month post-workshop survey? That should lie even within our modest resources.

ksen0 commented 9 years ago

Some thoughts, based on (1) experience in running survey and interview studies in general, (2) almost a decade of teaching and tutoring intro programming at many levels, and (3) a series of interviews with 15 past SWC participants, primarily in the natural sciences, almost half female. I spoke to each 3 times over the course of several months (depending on schedules, but about 3-4 months in summer 2014) after they went to software carpentry. The interviews were about motivations, what they learned, how they incorporated it into their work, and what they felt the benefits of SWC to be.

Thought 1: Overall survey design should be iterative and aim for minimalism

Iteration is necessary for effective survey design, and I could help with this part by reaching out to prior interviewees. Iteration involves doing a couple (3-5) surveys as pilots at each stage to ensure that the way that respondents interpret the questions matches how the questions are intended. It's not hard to do because it does not require a lot of people to respond. With the 15 existing participants, I could test out a handful of drafts, which ought to be plenty.
Incentives are not nearly as effective as having a well-designed survey, in terms of getting fully-completed surveys. Many people are willing to fill out short surveys, but if the survey has multiple pages of huge chunks of very similar likert-style questions, response rates get much much lower. On the one hand, asking the question in several different ways might help with validity but not if people are so annoyed they stop reading the questions altogether, which happens. Analysis downstream is more helped by a larger N anyway. Which questions to ask we can figure out with iteration.

Thought 2: Important not to conflate "using the skill" with "benefitting from the skill"

In the sample questions for shell scripts, for example, it asks something like ‘do you understand how shell scripting can help you automate tasks.’ This is borderline condescending, since it is entirely possible that the kinds of things that people are doing do not, actually, benefit from that particular skill. An alternative question might be: “Have you recently tried to improve some part of your scientific or research workflow using skills you learned at SWC?” with answers like “Yes, automation of a repetitive process using shell scripting” or “Yes, doing computation remotely rather than on my machine,” “No, not using anything I learned at SWC, but because of SWC I felt comfortable looking up a reasonable solution for my problem,” “No, the skills I lean red at SWC were not applicable"
The most consistent interview finding is that the primary benefit of SWC is the sense of efficacy: maybe not the specific technology or skill someone learned, but the process of having learned that skill (1) helped them to see that looking things up is not scary, (2) gave them the beginnings of a vocabulary to express problems as bugs or in an abstract way. So I think it is of huge importance to capture this benefit. Almost all of the people I spoke to were not still using anything they learned at SWC per se, but all felt that they got something really valuable out of SWC. So it would be really important not to miss that in an assessment!

ksen0 commented 9 years ago

(Note that I think these are too many & too complicated, but this is my brain dump firt draft.)

What was your reason for attending the software carpentry event? (Check all that apply) [see footnote]

To get a sense of technologies or skills that are talked about in my field
To expand the methodological skill set within my group or lab
To explore different options for approaching some specific ongoing project(s)
To gain a specific skill for some specific ongoing project(s)
Other: ___

What is your technical background, prior to attending the software carpentry event? (Check all that apply) (Would be nice to brainstorm some more here, as well as improve wording and examples :)

Using data collection or analysis tools with a visual interface that do not writing code, like SPSS, Excel, GIS, etc
Writing code within tools like SPSS or Excel to perform additional functions
Using a programming language, like python or R, to create scripts for data cleaning or analysis tasks
Using MATLAB or iPython notebook for modeling or data analysis
Using the terminal or the command line
Working with a structured database, like SQL, to store or process data
Using ggplot2, MATLAB, gnu plot or other programmatic means to visually chart and inspect your observational or simulation data
Using Excel or Tableau to create visualization of your data or findings
Other: ___

After the software carpentry event, please check all the activities that you currently engage in as part of your typical workflow: (Same options as above)

Did you use version control prior to SWC? (Check all that apply)

Yes, to keep an organized record of my work
Yes, to work with others in my lab or group
Yes, to share code publicly
No, because it is too much work to get started
No, because the code itself is not that complicated and it is only one-time-use scripts
No, because people with whom I write code and I have access to a common machine and do not experience problems working together that could be solved with version control
No, and this is typical in my lab, group, or field
No, but I am frequently told to use version control

After SWC, do you use version control? (Check all that apply) (Same answers as above)

Are there technical skills that you learned at the SWC bootcamp that you COULD use, if your work required it? Check all the reasons that apply to why you do NOT use those skills.

It is more effective/efficient that I delegate these skills to others in my lab or group
I am too busy focusing on other activities (like writing papers or teaching)
The current approach is less elegant than I would like, but it works and making it elegant is not the highest priority
The current approach is sufficiently effective, but for the next project that is similar, I/we will try to apply the new skills
Using the skill would make it difficult to work with other people in my lab/group, or in my field, because it is a different way of doing things
Other: ___

Please check all that apply. (I really dislike most of this wording, esp this “software skills” thing - ideas? I do want to include the whole range of skills that SWC covers, which is not just programming. IMO a good way to get it better would be to just ask people who represent the target demographic of this survey for feedback…)

I feel confident that I am able to figure out some way to approach a software challenge in the course of my research
I have recently integrated multiple different tools or software packages to solve a particular challenge
When I have software skills, I know someone(s) in my lab, group, or broader professional environment to ask
When I have software skills, I am happy with the resources (books, forums, search engine, etc) I have access to for finding an answer on my own
I take pride in my/our elegant technical solution
I feel frustrated with the inelegance of my/our software skills
I feel like I am/we are effectively incorporating software skills into my/our research practice.

Open-ended question: What do you think is the biggest impact, if any, positive or negative, did SWC have on the way that you do your work?

Where these questions came from:

Previously-identified areas for assessment: (1) attitudinal, (2) skill, (3) declaratory knowledge.

My revised areas for assessment:

affective (regarding confidence, comfort, sense of efficacy, as well as sense of pride, elegance, beauty; technical resourcefulness; social resourcefulness)
atomic skills (fundamental “atomic” knowledge - control structures, data structures, algorithm complexity, abstraction) and integrative skills (using multiple different data science tools in the workflow; overcoming computational or data friction in creative, context-dependent ways)

Background information:

technical background (both level of formality and language or tool most comfortable with before SWC)
reason for coming

Some half-baked hypotheses off the top of my head:

SWC participants will have poorer performance with integrative skills than with atomic skills
Performance on atomic skills will depend most on technical background
Performance on integrative skill will depend most on reason for coming
Affective variables will be positively correlated with resourcefulness

Footnote: The options are based on the Aragon&Williams categories of collaborative creativity, which was a useful way to categorize motivations in the interviews so far. I’m not married to it, just something to start with. These are the “stages of collaborative creativity:" Focus – Trying to figure out where the group stands/represent Frame – Growing a group cohesion, create a “supportive affective” environment that encourage creativity Create – Open communication for building upon an idea Complete – When the emergent idea is evaluated, and, where appropriate, elaborated cycling back for as long as necessary into the create phase.

chendaniely commented 9 years ago

@uiuc-cse yes adjusting for various confounding variables may or may not be problematic. We definetly need to run the survey through numerous workshops before we get a large enough N to start adjusting for variables. Which ones did you have in mind?

@gvwilson do we need some kind of IRB approval if we ask questions about gender or any personal information?

chendaniely commented 9 years ago

@katerena what are your thoughts regarding likert questions? Did you find them being effective in your previous work?

ksen0 commented 9 years ago

@chendaniely on Likert questions: they are fine, as long as not too many. I find that when there are a lot of questions and a lot of things for participants to look at, they get distracted and start answering basically at random. There is a paper I recall from CHI09 or CHI10 about how Likert questions are terrible because some people tend to veer toward extreme answers and some people tend to veer toward mild answers, and so the extent of agreeing or disagreeing is more of an individual difference than a meaningful signal about the question. In practice, I think the fatigue/frustration effect is more pronounced: people drop out of the survey or just answer later likert questions with the same value, even if you mix up the questions. Likert questions make sense for really massive surveys and when you have well-tested likert instruments - surveys that someone designed to have the right amount of redundancy - so you end up trumping all those individual issues with the sheer force of N. I tend to think that a survey ought to be able to be interesting and meaningful with a few hundred responses, especially if the respondent demographic is really, really niche. Note that because I do so much interviews/observations, I am very sensitive to things like how the quality of human-subjects data depends on how that particular human is feeling and reacting to the question. I have generally found it useful to keep these concerns in mind for surveys as well.

tl;dr version - they are fine if there are maybe 5 or 10 of them, but I think if we want to have a survey that covers all the necessary topics while minimizing the amount of time needed to complete it, Likerts are not the best way to get there

chendaniely commented 9 years ago

We were actually thinking of creating a survey where we go through the EFA CFA and eventually IRT to make sure we have a consistent and valid survey with the fewest number of questions. But we first need a large pool of questions to start filtering down first.

Data-Science-User commented 9 years ago

Hey everyone,

Sorry for the late response.

I mentioned this to Daniel, but I figured that I keep everyone else in the loop as well...

For next semester, I don't think I'll be able to because of school-related stuff, but I should be able to around Spring 2016 - is it possible participate then?

Sincerely, Chak

On Mon, Jun 1, 2015 at 6:37 PM, Daniel Chen notifications@github.com wrote:

We were actually thinking of creating a survey where we go through the EFA CFA and eventually IRT to make sure we have a consistent and valid survey with the fewest number of questions. But we first need a large pool of questions to start filtering down first.

— Reply to this email directly or view it on GitHub https://github.com/swcarpentry/assessment/issues/2#issuecomment-107738111 .

ksen0 commented 9 years ago

Lab-related comments is something I heard about a lot as well. The [probably less-common? don't know] flipside to PI pushback is overly-vague enthusiasm, like sending all lab members to SWC to learn things that the participants do not recognize as applicable to their work. In some cases, people were quite surprised to learn useful skills after all.

In any case, I strongly agree that asking a question about the social context / lab aspect is a good idea!

On Mon, Jun 8, 2015 at 7:48 PM, John Pellman notifications@github.com wrote:

As a former participant of a software carpentry workshop, I think that it would be good idea to gauge social network effects (I suppose this would fall under the 'attitudinal changes' category). Questions such as, "Do you think that other members of your lab would benefit from software carpentry?" or "How difficult do you believe it is to convince your supervisor to let more members of your lab attend software carpentry?"

My experience was that many people in my lab would have benefited from basic shell scripting skills, but did not attend SW because (in the fast-paced world of publish or perish) my PI was reluctant to reallocate time for training (as opposed to article writing) and was not adequately aware of the long-term value increased programming literacy would have for our lab.

Perhaps this is something that might be kept in mind for outreach as well- it would be nice for SW participants to bring back some sort of literature that might be used to persuade their PI to invest more resources in training.

--Potentially tangential 2 cents

— Reply to this email directly or view it on GitHub https://github.com/swcarpentry/assessment/issues/2#issuecomment-110202978 .

Best, Katie

swcarpentry / assessment

What outcomes do we want to assess? #2

Validity

Reliability