higlass / clodius

Clodius is a tool for breaking up large data sets into smaller tiles that can subsequently be displayed using an appropriate viewer.
MIT License
38 stars 21 forks source link

Getting h5py TypeError exceptions when converting epilogos BED to multivec #25

Closed alexpreynolds closed 6 years ago

alexpreynolds commented 6 years ago

In preparation for ingesting epilogos tilesets, I tried converting two test epilogos BED files to multivec format using conda (Python v3.6.4) and clodius v0.7.4. The host is running Ubuntu Xenial v16.04.

I get two different types of TypeError exceptions when attempting conversion, which appear to be thrown by h5py.

Here is the first test dataset:

$ wget -qO- https://epilogos.altiusinstitute.org/assets/epilogos/v06_16_2017/hg19/15/group/all.KL.bed.gz > /tmp/all.KL.bed.gz

Here is the row-infos text file I am using:

$ cat > ~/epilogos_hg38_observed_states.txt
Active TSS
Flanking Active TSS
Transcription at gene 5p and 3p
Strong transcription
Weak transcription
Genic enhancers
Enhancers
ZNF genes + repeats
Heterochromatin
Bivalent/Poised TSS
Flanking Bivalent TSS/Enh
Bivalent Enhancer
Repressed PolyComb
Weak Repressed PolyComb
Quiescent/Low

I installed conda via Anaconda:

$ wget https://repo.anaconda.com/archive/Anaconda3-5.1.0-Linux-x86_64.sh
$ bash ./Anaconda3-5.1.0-Linux-x86_64.sh
$ python --version
Python 3.6.4 :: Anaconda, Inc.

I then installed libz and pybigwig dependencies, and then installed clodius from the develop branch:

$ sudo apt-get install zlib1g-dev
$ conda update -n base conda
$ conda install -c bioconda pybigwig
$ cd ~/github/hms-dbmi
$ git clone https://github.com/hms-dbmi/clodius.git
$ cd clodius
$ git branch
* develop
$ python setup.py develop
...
$ which clodius
/home/ubuntu/anaconda3/bin/clodius

When I attempt to convert the epilogos file all.KL.bed.gz, here is the error message I get:

$ clodius convert bedfile_to_multivec /tmp/all.KL.bed.gz \
--assembly hg38 \
--starting-resolution 200 \
--row-infos-filename /home/ubuntu/epilogos_hg38_observed_states.txt \
--num-rows 15 \
--format epilogos
temporary dir: /tmp/tmpjrk4ruzt
dumping batch: chr1 100000
dumping batch: chr1 200000
dumping batch: chr1 300000
dumping batch: chr1 400000
dumping batch: chr1 500000
dumping batch: chr1 600000
dumping batch: chr1 700000
dumping batch: chr1 800000
dumping batch: chr1 900000
dumping batch: chr1 1000000
dumping batch: chr1 1100000
dumping batch: chr1 1200000
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/bin/clodius", line 11, in <module>
    load_entry_point('clodius', 'console_scripts', 'clodius')()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/ubuntu/github/hms-dbmi/clodius/clodius/cli/convert.py", line 227, in bedfile_to_multivec
    format, row_infos_filename, tile_size)
  File "/home/ubuntu/github/hms-dbmi/clodius/clodius/cli/convert.py", line 101, in _bedgraph_to_multivec
    starting_resolution, has_header, chunk_size);
  File "/home/ubuntu/github/hms-dbmi/clodius/clodius/multivec.py", line 43, in bedfile_to_multivec
    f_out[prev_chrom][batch_start_index:batch_start_index+len(batch)] = np.array(batch)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/_hl/dataset.py", line 631, in __setitem__
    for fspace in selection.broadcast(mshape):
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/_hl/selections.py", line 299, in broadcast
    raise TypeError("Can't broadcast %s -> %s" % (target_shape, count))
TypeError: Can't broadcast (46253, 15) -> (44782, 15)

I get the same TypeError exception and error message thrown if I use virtualenv to run clodius, in place of conda.

The second test dataset is derived from the first, where I excise chrX and try to convert just that chromosome to a tileset:

$ conda install -c bioconda bedops
$ gunzip -c /tmp/all.KL.bed.gz > /tmp/all.KL.bed
$ bedextract chrX /tmp/all.KL.bed > /tmp/all.KL.chrX.bed
$ gzip -c /tmp/all.KL.chrX.bed > /tmp/all.KL.chrX.bed.gz
$ clodius convert bedfile_to_multivec /tmp/all.KL.chrX.bed.gz
--assembly hg38 \
--starting-resolution 200 \
--row-infos-filename /home/ubuntu/epilogos_hg38_observed_states.txt \
--num-rows 15 \
--format epilogos
temporary dir: /tmp/tmpiw7ebd50
dumping batch: chrX 100000
dumping batch: chrX 200000
dumping batch: chrX 300000
dumping batch: chrX 400000
dumping batch: chrX 500000
dumping batch: chrX 600000
dumping batch: chrX 700000
output_file: /tmp/all.KL.chrX.bed.multires.mv5
creating new dataset
array_data.shape (1244782, 15)
copy start: 0 100000
copy start: 100000 100000
copy start: 200000 100000
copy start: 300000 100000
copy start: 400000 100000
copy start: 500000 100000
copy start: 600000 100000
copy start: 700000 100000
copy start: 800000 100000
copy start: 900000 100000
copy start: 1000000 100000
copy start: 1100000 100000
copy start: 1200000 100000
…
creating new dataset
array_data.shape (4, 15)
copy start: 0 4
start: 0
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/bin/clodius", line 11, in <module>
    load_entry_point('clodius', 'console_scripts', 'clodius')()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/ubuntu/github/hms-dbmi/clodius/clodius/cli/convert.py", line 227, in bedfile_to_multivec
    format, row_infos_filename, tile_size)
  File "/home/ubuntu/github/hms-dbmi/clodius/clodius/cli/convert.py", line 124, in _bedgraph_to_multivec
    row_infos=row_infos)
  File "/home/ubuntu/github/hms-dbmi/clodius/clodius/multivec.py", line 257, in create_multivec_multires
    f['resolutions'][str(curr_resolution)]['values'][chrom][start/2:start/2+chunk_size/2] = new_data
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/_hl/dataset.py", line 609, in __setitem__
    selection = sel.select(self.shape, args, dsid=self.id)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/_hl/selections.py", line 94, in select
    sel[args]
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/_hl/selections.py", line 261, in __getitem__
    start, count, step, scalar = _handle_simple(self.shape,args)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/_hl/selections.py", line 447, in _handle_simple
    x,y,z = _translate_slice(arg, length)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/_hl/selections.py", line 480, in _translate_slice
    start, stop, step = exp.indices(length)
TypeError: slice indices must be integers or None or have an __index__ method

In both cases, it appears the exception is thrown from the h5py library. I have v2.7.1 of this library installed.

I was wondering if there is a specific version of this library I should use, or other changes I should make to my Python environment, which could help conversion tests.

alexpreynolds commented 6 years ago

I apologize. This was due to a typo and appears to be resolved.