revisions to the open science lesson plan

cboettig commented 9 years ago

Revisions follow along the lines of the proposal discussed in https://github.com/swcarpentry/bc/issues/712

Given the potentially more subjective nature of this particular lesson, I found some of the revisions challenging to implement. I have tried to focus the lesson on providing students with the information they would need if they chose to make data or software openly available, rather than trying to convince them of what they should do. Please let me know if this leaves any part inadequately motivated, or if there is anywhere I have been too prescriptive on a subjective topic.

Even while though I've avoided the cultural/philosophical discussion, the lesson is still not particularly hands-on. I'm curious if people feel that students would be better served by a tutorial in which they actually deposit data or share code on a particular repository, license and all. The downside of that approach is to disproportionately emphasize one particular repository over the others (which is why I avoid a specific example).

I have tried to keep the lesson to the length of the former, but not having taught that before I'm not sure how long it takes. I imagine this is one lesson instructors might use rather flexibly, skimming through the essentials very quickly if running behind schedule. I've tried to be concise and rely on external links for elaboration, but nevertheless some parts could be tightened up.

I've marked a few points with inline HTML comments where specific feedback would be helpful.

gdevenyi commented 9 years ago

You have a bunch of comments embedded in your changes. Could you please remove those comments and describe them in the PR. Show us what you want to do, so we can comment on the content directly.

cboettig commented 9 years ago

@gdevenyi Yes, as I stated above, I have marked a few lines in my edits with HTML comments where specific feedback is needed. These were:

L18 I commented out the opening quote and explained my reason for doing so. Similarly my reason for removing the opening story/vignette
L43 Should I have some disclaimer explaining the focus of the lesson is more on the how than the why?
L154 I briefly mention language specific archives for distributing software such as CRAN for R. I'm aware python has several package management systems, but not sure what, if any, would be appropriate to mention here in the discussion of what, if anything, researchers use to distribute python software.
L168 Mention gitlab?
L186 Should I introduce the open data section more explicitly (e.g. again focus is on how, more than why), or just jump in?
L242 Is the section following this too long? should it simply be removed?

I'd used html comments because I've found this keeps things more in-context (line numbers change during further edits, the above line references refer to the original) and does not require a separate window. My apologies if that is not the preferred style.

arokem commented 9 years ago

Some institutions have local resources, and it might be worth tailoring the lesson accordingly, possibly even doing a bit of a practical exercise of data-sharing with the local resources. For example, if I were teaching this lesson at Stanford, I would point out the SDR (sdr.stanford.edu), and maybe even show them (in collaboration with the SDR folks) how to put their data (e.g. what metadata they need for the deposit, etc.)

On Tue, Nov 11, 2014 at 8:43 AM, Carl Boettiger notifications@github.com wrote:

@gdevenyi https://github.com/gdevenyi Yes, as I stated above, I have marked a few lines in my edits with HTML comments where specific feedback is needed. These were:

-

L18 I commented out the opening quote and explained my reason for doing so. Similarly my reason for removing the opening story/vignette

L43 Should I have some disclaimer explaining the focus of the lesson is more on the how than the why?

L154 I briefly mention language specific archives for distributing software such as CRAN for R. I'm aware python has several package management systems, but not sure what, if any, would be appropriate to mention here in the discussion of what, if anything, researchers use to distribute python software.

L168 Mention gitlab?

L186 Should I introduce the open data section more explicitly (e.g. again focus is on how, more than why), or just jump in?

L242 Is the section following this too long? should it simply be removed?

I'd used html comments because I've found this keeps things more in-context (line numbers change during further edits, the above line references refer to the original) and does not require a separate window. My apologies if that is not the preferred style https://github.com/swcarpentry/bc/blob/gh-pages/CONTRIBUTING.md.

— Reply to this email directly or view it on GitHub https://github.com/swcarpentry/bc/pull/851#issuecomment-62575580.

cboettig commented 9 years ago

@arokem thanks for the revisions, this all sounds great! Any thoughts on the other queries I mention in this thread before I start making changes?

Yup, I'd imagine some instructors might want to customize this to both be a bit more hands on and walk through particular examples of data repository relevant to the specific audience. I figured that would be a better approach anyway than shoehorning in a walkthrough of a very generic repository in the lesson here. Also given that the original lesson I modified didn't have any tutorial element, I figured it would be better to leave that out here. Happy to add something in if that's the consensus, but I'd find it hard to do so without the resulting lesson being one that would take rather longer to cover than the original lesson.

strasser commented 9 years ago

@cboettig This is such a great lesson! I'm excited to point people to it. A few comments:

Re. licensing data:

Just because they aren't subject to copyright doesn't mean that they aren't subject to intellectual property of some kind. That is, most institutions claim ownership of datasets. This means that (in the case of the UC) you aren't ALLOWED to use a cc-0 waiver on data you produce because technically you don't have any rights (copyright or otherwise) to waive. They belong to the regents of the UC.

I would probably fix this by adding some language about the complicated questions about data governance, and that although people should err on the side of permissive licensing (e.g., CC-BY 4.0 or CC-0), they should be aware that their institution might have clauses about ownership that they are unaware of. Of course, some institutions have a bit of a "don't ask / don't tell" policy when it comes to data licenses these days. It's basically the wild west out there.

Re. the scientific data repositories section:

I would argue that the word "scientific" isn't useful since (for example) figshare doesn't have any requirements on the content being scientific. Similarly institutional repositories take all kinds of data.

Institutional repositories deserve a mention here, too, since they might be taking a larger role in this space due to new funder requirements and projects like SHARE. (@arokem - Stanford librarians would be so happy you mentioned SDU!)

This is a potentially nitpicky point, but the division into "with pubs" and "without pubs" for data continues to encourage people to think about traditional scholarly incentives (papers). What if you divide into disciplinary versus general instead?

Pointing people to a publisher (nature scientific data) shouldn't be the only place to look for repository help. I would also include searching databib or re3data.

I would suggest changing the URL for DOI lookup to dx.doi.org

cboettig commented 9 years ago

Carly, thanks these are all awesome suggestions. Would love to lean on your expertise a bit more if I can ask for some more suggestions:

Are there any resources we can point people to trying to make more sense of the data ownership issue? (Similar problem arises in software where the trickiest part isn't what license to choose but whether it's up to you at all -- I tried to point people to check with their institution; though that feels a bit like a cop-out...)

yeah, good point on 'scientific', I'm usually better about avoiding that. (I guess there's no risk about 'data repository' being confused with non-academic data hosting / dropbox etc, right?)

Also very good point about the division of 'pubs' & 'non-pubs'; subject/content specific and general is probably a better division. Though since there's so many ways to skin the cat, maybe it's better to just highlight the various axes of differentiation without trying to draw specific groups? general/specific, post-publication only / agnostic to publication, and perhaps fee-charging / free? Guess there's always other things to consider too (available metadata, recognition identifier, indexing, clearly that's a rabbit hole we're not ready to go down here though).

Yeah, didn't mean to indicate that publishers should be the definitive source, just trying to be concise. Was originally going to point to the databib list on DataCite website, but to be honest the embedded google doc is far less accessible than the more discoursive doesn't seem to have all DataCite DOI repos (zenodo?)

Is dx.doi.org still the preferred over doi.org then? Didn't realize -- will fix!

strasser commented 9 years ago

Are there any resources we can point people to trying to make more sense of the data ownership issue?

Alas, it's about the same as software... in theory people should check with their institutions first and foremost, but in practice I encourage people to go ahead and slap as liberal a license as possible on their data, and follow the "ask forgiveness not permission" adage.

yeah, good point on 'scientific', I'm usually better about avoiding that. (I guess there's no risk about 'data repository' being confused with non-academic data hosting / dropbox etc, right?)

I would hope that there's no risk of confusion there, but maybe using the word "preservation" somewhere would help distinguish.

Also very good point about the division of 'pubs' & 'non-pubs'; subject/content specific and general is probably a better division. Though since there's so many ways to skin the cat, maybe it's better to just highlight the various axes of differentiation without trying to draw specific groups? general/specific, post-publication only / agnostic to publication, and perhaps fee-charging / free? Guess there's always other things to consider too (available metadata, recognition identifier, indexing, clearly that's a rabbit hole we're not ready to go down here though).

I think it's a great idea to just highlight the axes. Certainly the pub/non-pub dichotomy is something that gets discussed, so no reason to leave it out. Maybe just include those other axes in your discussion. Axes I think are important:

general / specific
post-publication / agnostic
institution / non-institution (e.g., commercially owned)
free / not free
discipline-specific / agnostic
open to anyone / not

Yeah, didn't mean to indicate that publishers should be the definitive source, just trying to be concise. Was originally going to point to the databib list on DataCite website, but to be honest the embedded google doc is far less accessible than the more discoursive doesn't seem to have all DataCite DOI repos (zenodo?)

Agree - I think the DataCite / databib list isn't quite up to snuff yet. Maybe keep the reference to nature scientific data, but also mention re3data and perhaps checking for institutional repositories?

Is dx.doi.org still the preferred over doi.org then? Didn't realize -- will fix!

dx.doi.org has the advantage of a nice search box where you can plop your DOI and get immediate access to the object. doi.org is more of an informational website.

swcarpentry / DEPRECATED-bc

revisions to the open science lesson plan #851

L18 I commented out the opening quote and explained my reason for doing so. Similarly my reason for removing the opening story/vignette

L43 Should I have some disclaimer explaining the focus of the lesson is more on the how than the why?

L168 Mention gitlab?

L186 Should I introduce the open data section more explicitly (e.g. again focus is on how, more than why), or just jump in?