ucscXena / xena-GDC-ETL

Extract, transform and load GDC data onto UCSC Xena
Apache License 2.0
12 stars 8 forks source link
gdc xena

xena-GDC-ETL

.. image:: https://travis-ci.org/ucscXena/xena-GDC-ETL.svg?branch=master :target: https://travis-ci.org/ucscXena/xena-GDC-ETL

Extract, transform and load GDC data <https://portal.gdc.cancer.gov/> onto UCSC Xena <https://xenabrowser.net/>.

Table of Contents

Dependencies

Specific versions mentioned below have been tested. Eariler versions may still work but not guaranteed.

  1. Python 2.7, 3.5+

    This pipeline has been tested with python 2.7, 3.5, 3.6 and 3.7. It may also work with other python 3 versions since it was originally designed to be single-source Python 2/3 compatible <https://docs.python.org/3/howto/pyporting.html#the-short-explanation>_.

  2. Requests <http://python-requests.org>_ v1.2.3

  3. Numpy <https://www.numpy.org/>_ v1.15.0

  4. Pandas <https://pandas.pydata.org/>_ v0.23.2

  5. Jinja2 <http://jinja.pocoo.org/>_ v2.10.1: used for generating metadata JSON.

  6. lxml <https://lxml.de/>_ v4.2.0: used for parsing TCGA phenotype data

  7. xlrd <https://xlrd.readthedocs.io/en/latest/>_ v1.1.0: used for reading TARGET phenotype data

Installation

Basic usage with command line tools

.. _gdc2xena:

.. _gdc_check_new:

.. _version:

.. _xena-eql:

.. _merge-xena:

Advanced usage with XenaDataset and its subclasses

.. _subclasses:

GDC ETL settings

.. _GDC download settings:

.. _GDC genomic transform settings:

.. _transform phenotype:

Documentation

Check documentation for GDC module and Xena Dataset module here <https://github.com/ucscXena/xena-GDC-ETL/blob/master/docs/API.rst>_.