owid / owid-grapher

A platform for creating interactive data visualizations
https://ourworldindata.org
MIT License
1.37k stars 230 forks source link

Provide citation guidance in the Downloads tab #1137

Closed larsyencken closed 10 months ago

larsyencken commented 2 years ago

Migrated from Notion: https://www.notion.so/owid/Provide-citation-guidance-in-the-Downloads-tab-50edb8b91cc24a3f9bfd2dc164f659fe

Problem

People download and reuse datasets, then cite us instead of the original providers. This makes original providers less happy sharing data with us, and by extension the general public.

Quick fix solution

Put citation guidance into the download tab of every chart, under the download button. In that guidance, we should separate out how to cite the data (it should be the same as Sources in the chart).

Full discussion below.


Background

We have a general worry that data providers do not get enough credit in our work.

That’s bad obviously because they deserve lots of credit. But it’s also a strategic risk for us: In order for us to do our job, we need data providers to be happy and supportive of our work.

One aspect of this general worry is how people cite data when accessing it through us. It’s a common thing that people say ‘Source: OWID’ at the bottom of a chart they’ve made, so that the data provider gets no credit.

We should do what we can to avoid this happening.

A simple step is simply to give users clear guidance on how they should write the citation when using data from OWID but not produced by OWID.

Where should the citation guidance be given?

❗ We should show it prominently in the place where most people reusing our data get it – the download tab on charts.

how to cite in download tab-01

Our general policy on this should maybe also be written up as an FAQ in our About section.

What should the citation guidance be? How should we implement it?

Quick fix proposal for now

Add something like the following to all charts:

How to cite this work Data should be cited as: ‘source’. Chart should be cited as: ‘Data from source, Chart from Our World in Data’

Or to make it very explicit:

How to cite this work If reusing this work, please provide a citation that makes clear the contribution of the data providers: Data should be cited as: ‘source’. Chart should be cited as: ‘Data from source, Chart from Our World in Data’

This message should be automatically generated, but with the possibility for manual override (i.e. it’s another field in the Grapher/Bulk-FASTT admin).

Longer-term solution

There are different cases that we need to think through what the guidance should be. In the short-term we could override less typical cases manually, but in the longer these different cases should be mostly be handled automatically.

@JoeHasell @maxroser

larsyencken commented 2 years ago

@JoeHasell Just thinking that we do some small transformations to the data, such as throwing some out, renaming countries, sometimes calculating per-capita versions of metrics, etc. It could also be misleading to purely cite the upstream source, since it's possible for us to introduce errors that are not upstream.

The most accurate citation for data might actually be 'Poore and Nemecek (2018) via Our World In Data'.

I of course understand that it's sensitive for data providers. We can just leave it as 'Poore and Nemecek (2018)' and hope that our changes are minor enough that 95% of the time they do not diverge from the upstream data in any significant way.

Do you have an instinct here on which is better?

CGiattino commented 2 years ago

My two cents: I agree this is very important, but I think many users could still miss the citation information in the Download tab because they never go to that tab—they just take a screenshot. Maybe there's a way to make it even more un-missable.

We see screenshots a fair amount, on social media but also from user interviews and user feedback messages—this includes academic researchers, people at nonprofits and in industry, etc. People who "should know better."

One idea: I think we could make the citation even more un-missable on the chart by putting a button right next to the source that provides the information on a click—see the blue "How to cite" in the source line of the chart.

how-to-cite

JoeHasell commented 2 years ago

In reference to Charlie's point:

1) It's a bit of a design question, but my personal view would be to not take up prime real estate within the main chart view for something that will only be relevant to a very small subset of people. 2) But we can then make it VERY prominent in the place where most people are downloading: the download tab. 2) I also think it's a bit redundant on the main chart view: On the assumption that we are writing the source well in our charts, then someone showing a screenshot will already be showing the data source in a way we are happy with anyhow. Those people who take a selective screenshot of only part of the chart, actively cutting out the source, I think are unlikely to then take the trouble of writing their own citation.

larsyencken commented 2 years ago

@mathisonian Was just reviewing this with @danyx23 now, unclear whether it seems worth bolting on to the Downloads part of grapher, or whether we should try to address this problem some other way. Any thoughts?

mathisonian commented 2 years ago

This seems worthwhile to me. Its relatively low cost, and doesn't interfere with the current functionality. If its relatively straightforward to implement and helps keep the data providers happy it makes sense to me. I would keep it in the download tab and not put it in the main chart view

JoeHasell commented 2 years ago

Just to add here that (when we discussed this a long time ago now) there was a strong appetite for this from Max and the authors. We have heard on the grape vine that some data providers are not currently happy with what we do, and that (if true) is obviously a big issue. This was envisaged as a very quick and easy step so that at least we're going in the right direction.

maxroser commented 2 years ago

I agree that this is important – we should make clear who did the hard work of producing the data and we should avoid that our readers think that it is us at OWID who do this work.

Some comments:

larsyencken commented 2 years ago

@marcelgerber should have capacity to do this specific fix this cycle. Let's take baby steps here, and Matt can keep this concern in mind.

Aside from design changes, we could also even wrote more on this topic, explaining that we are a data republisher, and highlighting how much work goes into building these original datasets, both by institutions and individuals. We could also periodically highlight individual researchers or institutions who have done a mammoth job to fill an important data gap? (maybe we feel we do this a lot already)

eoo-owid commented 2 years ago

My view based on reading user feedback is that

On the second point, we've gotten much better at this with the work from the Data Managers. But there's more we could do there. This actually came up recently, because there are some charts where the label in the source footer and the info on the source tab are not perfectly consistent, which creates ambiguity and confusion for users. More details here: https://github.com/owid/owid-issues/issues/513

marcelgerber commented 2 years ago

Some notes on this after talking to @danyx23 about this issue, which I'm gonna work on next cycle:

marcelgerber commented 1 year ago

Discussed this with the Future of Publishing group today:

danyx23 commented 1 year ago

@JoeHasell we just talked about this with @mathisonian and we think it makes sense to think about this as part of the data pages project (i.e. how do we want things to be cited and come up with a plan that allows us to surface this information to users at some point in this year)

JoeHasell commented 1 year ago

Very much agree!