mozilla / mozfest-program

INACTIVE - http://mzl.la/ghe-archive - Where we're reviewing and scheduling the Mozfest sessions.
45 stars 5 forks source link

Content Mining for Transparency of Drug Research #273

Closed mmmavis closed 2 months ago

mmmavis commented 9 years ago

[ Google Spreadsheet Row Number ] 234 [ Facilitator ] Christopher Kittel

Description

In the field of drug trials, reproducibility of science is a cornerstone for efficiency and transparency of research. In the session participants will gain experience in a practical way the benefits researchers and the public can reap from applying content mining tools to large corpora of scientific texts.

In a practical example we will review drug papers regarding company contributions. The technology we’re going to employ is ready to extract named entities from specific parts of texts (e.g. company names in the acknowledgement section of a paper). A dataset includes metadata of a publication and data we extract from the content. Building on an example dataset, participants will be able to create and visualize relations between entities, e.g. authors, drugs, and companies. Participants will learn about practical aspects of content mining, and how to visualize data in e.g. a network graph or a timeseries.

Agenda

The format is an interactive hacking session with ContentMine http://contentmine.org/ tools. The session will begin with a short presentation of the core technology, the data source used, and the key steps of the preprocessing pipeline that produced the sessions data set, in order to establish a common understanding (20 min). The main part of the session will be guided exploration with the help of jupyter-notebooks, which are going to be prepared modular. Beginners can simply click through the process, and experienced users have the possibility to modify and adapt.

In order to maximize learning effects, we will focus on working with results. We prepare the data and tools, so that participants can focus on asking questions to the data. Participants are encouraged to form groups and explore the data set with the help of the workshop facilitators.

Participants

The first part of the session (quick presentation of core technology and data set) is not dependent on the group size. The core materials of the session (jupyter notebooks) can be multiplied without any problem. The format (interactive, guided hacking) of course works best within a specific range of group size (5-20 participants). For up to 15, participants will be encouraged to form teams of two, combining different levels of programming competence. Larger groups will reduce the amount of time the facilitators can spend with any individual participant. Should the number of participants exceed 15, participants will be encouraged to form teams of three.

Outcome

Participants will have two ways how they can continue. The first is self-guided learning with the help of online tutorials we make available on our website. The technology we demonstrate is completely open source, and we will present open access literature as a data source that can be accessed without barriers. The second way is to participate and contribute to our growing communities of scholarly and non-scholarly users. Based on an online discussion platform, regular meetings in Cambridge, and online technical support via github, participants of the session will be able to make contact and exchange with peers long term.

marcwalsh-zz commented 9 years ago

cc @kaythaney