Getting h5py TypeError exceptions when converting epilogos BED to multivec #25

alexpreynolds commented 6 years ago

In preparation for ingesting epilogos tilesets, I tried converting two test epilogos BED files to multivec format using conda (Python v3.6.4) and clodius v0.7.4. The host is running Ubuntu Xenial v16.04.

I get two different types of TypeError exceptions when attempting conversion, which appear to be thrown by h5py.

Here is the first test dataset:

$ wget -qO- https://epilogos.altiusinstitute.org/assets/epilogos/v06_16_2017/hg19/15/group/all.KL.bed.gz > /tmp/all.KL.bed.gz

Here is the row-infos text file I am using:

$ cat > ~/epilogos_hg38_observed_states.txt
Active TSS
Flanking Active TSS
Transcription at gene 5p and 3p
Strong transcription
Weak transcription
Genic enhancers
ZNF genes + repeats
Bivalent/Poised TSS
Flanking Bivalent TSS/Enh
Bivalent Enhancer
Repressed PolyComb
Weak Repressed PolyComb

I installed conda via Anaconda:

$ wget https://repo.anaconda.com/archive/Anaconda3-5.1.0-Linux-x86_64.sh
$ bash ./Anaconda3-5.1.0-Linux-x86_64.sh
$ python --version
Python 3.6.4 :: Anaconda, Inc.

I then installed libz and pybigwig dependencies, and then installed clodius from the develop branch:

$ sudo apt-get install zlib1g-dev
$ conda update -n base conda
$ conda install -c bioconda pybigwig
$ cd ~/github/hms-dbmi
$ git clone https://github.com/hms-dbmi/clodius.git
$ cd clodius
$ git branch
* develop
$ python setup.py develop
$ which clodius

When I attempt to convert the epilogos file all.KL.bed.gz, here is the error message I get:

$ clodius convert bedfile_to_multivec /tmp/all.KL.bed.gz \
--assembly hg38 \
--starting-resolution 200 \
--row-infos-filename /home/ubuntu/epilogos_hg38_observed_states.txt \
--num-rows 15 \
--format epilogos
temporary dir: /tmp/tmpjrk4ruzt
dumping batch: chr1 100000
dumping batch: chr1 200000
dumping batch: chr1 300000
dumping batch: chr1 400000
dumping batch: chr1 500000
dumping batch: chr1 600000
dumping batch: chr1 700000
dumping batch: chr1 800000
dumping batch: chr1 900000
dumping batch: chr1 1000000
dumping batch: chr1 1100000
dumping batch: chr1 1200000
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/bin/clodius", line 11, in <module>
    load_entry_point('clodius', 'console_scripts', 'clodius')()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/ubuntu/github/hms-dbmi/clodius/clodius/cli/convert.py", line 227, in bedfile_to_multivec
    format, row_infos_filename, tile_size)
  File "/home/ubuntu/github/hms-dbmi/clodius/clodius/cli/convert.py", line 101, in _bedgraph_to_multivec
    starting_resolution, has_header, chunk_size);
  File "/home/ubuntu/github/hms-dbmi/clodius/clodius/multivec.py", line 43, in bedfile_to_multivec
    f_out[prev_chrom][batch_start_index:batch_start_index+len(batch)] = np.array(batch)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/_hl/dataset.py", line 631, in __setitem__
    for fspace in selection.broadcast(mshape):
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/_hl/selections.py", line 299, in broadcast
    raise TypeError("Can't broadcast %s -> %s" % (target_shape, count))
TypeError: Can't broadcast (46253, 15) -> (44782, 15)

I get the same TypeError exception and error message thrown if I use virtualenv to run clodius, in place of conda.

The second test dataset is derived from the first, where I excise chrX and try to convert just that chromosome to a tileset:

$ conda install -c bioconda bedops
$ gunzip -c /tmp/all.KL.bed.gz > /tmp/all.KL.bed
$ bedextract chrX /tmp/all.KL.bed > /tmp/all.KL.chrX.bed
$ gzip -c /tmp/all.KL.chrX.bed > /tmp/all.KL.chrX.bed.gz
$ clodius convert bedfile_to_multivec /tmp/all.KL.chrX.bed.gz
--assembly hg38 \
--starting-resolution 200 \
--row-infos-filename /home/ubuntu/epilogos_hg38_observed_states.txt \
--num-rows 15 \
--format epilogos
temporary dir: /tmp/tmpiw7ebd50
dumping batch: chrX 100000
dumping batch: chrX 200000
dumping batch: chrX 300000
dumping batch: chrX 400000
dumping batch: chrX 500000
dumping batch: chrX 600000
dumping batch: chrX 700000
output_file: /tmp/all.KL.chrX.bed.multires.mv5
creating new dataset
array_data.shape (1244782, 15)
copy start: 0 100000
copy start: 100000 100000
copy start: 200000 100000
copy start: 300000 100000
copy start: 400000 100000
copy start: 500000 100000
copy start: 600000 100000
copy start: 700000 100000
copy start: 800000 100000
copy start: 900000 100000
copy start: 1000000 100000
copy start: 1100000 100000
copy start: 1200000 100000
creating new dataset
array_data.shape (4, 15)
copy start: 0 4
start: 0
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/bin/clodius", line 11, in <module>
    load_entry_point('clodius', 'console_scripts', 'clodius')()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/ubuntu/github/hms-dbmi/clodius/clodius/cli/convert.py", line 227, in bedfile_to_multivec
    format, row_infos_filename, tile_size)
  File "/home/ubuntu/github/hms-dbmi/clodius/clodius/cli/convert.py", line 124, in _bedgraph_to_multivec
  File "/home/ubuntu/github/hms-dbmi/clodius/clodius/multivec.py", line 257, in create_multivec_multires
    f['resolutions'][str(curr_resolution)]['values'][chrom][start/2:start/2+chunk_size/2] = new_data
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/_hl/dataset.py", line 609, in __setitem__
    selection = sel.select(self.shape, args, dsid=self.id)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/_hl/selections.py", line 94, in select
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/_hl/selections.py", line 261, in __getitem__
    start, count, step, scalar = _handle_simple(self.shape,args)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/_hl/selections.py", line 447, in _handle_simple
    x,y,z = _translate_slice(arg, length)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/_hl/selections.py", line 480, in _translate_slice
    start, stop, step = exp.indices(length)
TypeError: slice indices must be integers or None or have an __index__ method

In both cases, it appears the exception is thrown from the h5py library. I have v2.7.1 of this library installed.

I was wondering if there is a specific version of this library I should use, or other changes I should make to my Python environment, which could help conversion tests.

alexpreynolds commented 6 years ago

I apologize. This was due to a typo and appears to be resolved.