Closed J0bbie closed 2 years ago
Dear @J0bbie,
Thanks for reporting. At a first glance, I don't see anything obviously wrong with your calls. From the error message I can get that the counts have the wrong dimensions. I will try to reproduce this on my end and come back to you once I know more.
Just as a hint, could you send me the dimensions of the count arrays. You can get them from:
h5ls -r <output>/spladder/genes_graph_conf3.merge_graphs.count.hdf5
Thanks, Andre
Dear Andre,
Thanks for the quick response! Here's the output:
h5ls -r spladder/genes_graph_conf3.merge_graphs.count.hdf5
/ Group
/edge_idx Dataset {398841}
/edges Dataset {398841, 26/Inf}
/gene_ids_edges Dataset {398841, 1}
/gene_ids_segs Dataset {661774, 1}
/gene_names Dataset {58870, 1}
/seg_len Dataset {661774, 1}
/seg_pos Dataset {661774, 1}
/segments Dataset {661774, 1}
/strains Dataset {26/Inf}
Thanks,
Job
Ah, this is very helpful. It looks like the collection step did not work as expected. For instance, the line
/segments Dataset {661774, 1}
should be
/segments Dataset {661774, 26}
This gives me something to search for. Will post here what I find.
Seeing the same thing here:
$ spladder build -o spladder_out.chr9 -a gencode.v19.annotation.pc.chr9.gtf -b $allbams --parallel 2 -c 0 --set-mm-tag nM --readlen 150
confidence 0 / sample 0
Loading gene structure from spladder_out.chr9/spladder/genes_graph_conf0.merge_graphs.pickle ...
... done.
spladder_out.chr9/merge_graphs_intron_retention_C0.pickle already exists
spladder_out.chr9/merge_graphs_exon_skip_C0.pickle already exists
spladder_out.chr9/merge_graphs_alt_5prime_C0.pickle and spladder_out.chr9/merge_graphs_alt_3prime_C0.pickle already exists
spladder_out.chr9/merge_graphs_mult_exon_skip_C0.pickle already exists
spladder_out.chr9/merge_graphs_mutex_exons_C0.pickle already exists
spladder_out.chr9/merge_graphs_intron_retention_C0.pickle already exists
spladder_out.chr9/merge_graphs_exon_skip_C0.pickle already exists
spladder_out.chr9/merge_graphs_mult_exon_skip_C0.pickle already exists
spladder_out.chr9/merge_graphs_alt_5prime_C0.pickle already exists
spladder_out.chr9/merge_graphs_alt_3prime_C0.pickle already exists
spladder_out.chr9/merge_graphs_mutex_exons_C0.pickle already exists
analyzing events with confidence 0
.Traceback (most recent call last):
File "/home/ubuntu/spladder/spladder-venv/bin/spladder", line 10, in <module>
sys.exit(main())
File "/home/ubuntu/spladder/spladder-venv/lib/python3.5/site-packages/spladder/spladder.py", line 192, in main
options.func(options)
File "/home/ubuntu/spladder/spladder-venv/lib/python3.5/site-packages/spladder/spladder_build.py", line 253, in spladder
analyze_events(options, options.event_types[e_idx])
File "/home/ubuntu/spladder/spladder-venv/lib/python3.5/site-packages/spladder/alt_splice/analyze.py", line 104, in analyze_events
(events_all, counts) = verify_all_events(events_all, sp.arange(len(options.strains)), options.bam_fnames, event_type, options)
File "/home/ubuntu/spladder/spladder-venv/lib/python3.5/site-packages/spladder/alt_splice/verify.py", line 493, in verify_all_events
segments = sp.atleast_2d(IN['segments'][gr_idx_segs, :])[:, strain_idx]
IndexError: index 1 is out of bounds for axis 1 with size 1
$ h5ls -r spladder_out.chr9/spladder/genes_graph_conf0.merge_graphs.count.hdf5
/ Group
/edge_idx Dataset {35428}
/edges Dataset {35428, 32/Inf}
/gene_ids_edges Dataset {35428, 1}
/gene_ids_segs Dataset {38150, 1}
/gene_names Dataset {808, 1}
/seg_len Dataset {38150, 1}
/seg_pos Dataset {38150, 1}
/segments Dataset {38150, 1}
/strains Dataset {32/Inf}
This should be fixed both in master and development. Will add it to the next release.
Hi Andre, I cloned and built the repo (and I'm seeing the 2.2.3 tag), and now I'm getting:
$ cat chr6.eventCalling.err
Traceback (most recent call last):
File "/home/ubuntu/spladder/spladder-2.2.3-venv/bin/spladder", line 11, in <module>
load_entry_point('spladder==2.2.3', 'console_scripts', 'spladder')()
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/pkg_resources/__init__.py", line 542, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/pkg_resources/__init__.py", line 2569, in load_entry_point
return ep.load()
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/pkg_resources/__init__.py", line 2229, in load
return self.resolve()
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/pkg_resources/__init__.py", line 2235, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/spladder-2.2.3-py3.5.egg/spladder/spladder.py", line 10, in <module>
from .spladder_test import spladder_test
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/spladder-2.2.3-py3.5.egg/spladder/spladder_test.py", line 3, in <module>
import statsmodels.api as sm
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/statsmodels-0.10.1-py3.5-linux-x86_64.egg/statsmodels/api.py", line 3, in <module>
from . import iolib
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/statsmodels-0.10.1-py3.5-linux-x86_64.egg/statsmodels/iolib/__init__.py", line 1, in <module>
from .foreign import StataReader, genfromdta, savetxt
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/statsmodels-0.10.1-py3.5-linux-x86_64.egg/statsmodels/iolib/foreign.py", line 14, in <module>
from statsmodels.compat.python import (zip, lzip, lmap, lrange, string_types, long, lfilter,
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/statsmodels-0.10.1-py3.5-linux-x86_64.egg/statsmodels/compat/__init__.py", line 1, in <module>
from statsmodels.tools._testing import PytestTester
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/statsmodels-0.10.1-py3.5-linux-x86_64.egg/statsmodels/tools/__init__.py", line 1, in <module>
from .tools import add_constant, categorical
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/statsmodels-0.10.1-py3.5-linux-x86_64.egg/statsmodels/tools/tools.py", line 7, in <module>
import pandas as pd
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/pandas-0.25.1-py3.5-linux-x86_64.egg/pandas/__init__.py", line 55, in <module>
from pandas.core.api import (
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/pandas-0.25.1-py3.5-linux-x86_64.egg/pandas/core/api.py", line 5, in <module>
from pandas.core.arrays.integer import (
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/pandas-0.25.1-py3.5-linux-x86_64.egg/pandas/core/arrays/__init__.py", line 1, in <module>
from .array_ import array # noqa: F401
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/pandas-0.25.1-py3.5-linux-x86_64.egg/pandas/core/arrays/array_.py", line 7, in <module>
from pandas.core.dtypes.common import (
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/pandas-0.25.1-py3.5-linux-x86_64.egg/pandas/core/dtypes/common.py", line 11, in <module>
from pandas.core.dtypes.dtypes import (
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/pandas-0.25.1-py3.5-linux-x86_64.egg/pandas/core/dtypes/dtypes.py", line 53, in <module>
class Registry:
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/site-packages/pandas-0.25.1-py3.5-linux-x86_64.egg/pandas/core/dtypes/dtypes.py", line 84, in Registry
self, dtype: Union[Type[ExtensionDtype], str]
File "/usr/lib/python3.5/typing.py", line 552, in __getitem__
dict(self.__dict__), parameters, _root=True)
File "/usr/lib/python3.5/typing.py", line 512, in __new__
for t2 in all_params - {t1} if not isinstance(t2, TypeVar)):
File "/usr/lib/python3.5/typing.py", line 512, in <genexpr>
for t2 in all_params - {t1} if not isinstance(t2, TypeVar)):
File "/usr/lib/python3.5/typing.py", line 1077, in __subclasscheck__
if super().__subclasscheck__(cls):
File "/home/ubuntu/spladder/spladder-2.2.3-venv/lib/python3.5/abc.py", line 225, in __subclasscheck__
for scls in cls.__subclasses__():
TypeError: descriptor '__subclasses__' of 'type' object needs an argument
From the last step:
cat chrList.txt | while read chr; do
spladder build -o spladder_out.${chr} -a gencode.v19.annotation.pc.${chr}.gtf \
-b $allbams --parallel 2 -c 0 --set-mm-tag nM --readlen 150 \
--event-types exon_skip,intron_retention,alt_3prime,alt_5prime,mutex_exons,mult_exon_skips \
--verbose 2>${chr}.eventCalling.err 1>${chr}.eventCalling.out &
done
Any thoughts?
~Joe
Hi Joe,
this seems unrelated to the previous issue. Would you mind opening a new one? Otherwise it is hard to track for me what has been worked on and what it still open.
Does this happen on a fresh installation? It looks to me as if the environment has some issues when loading statsmodels.
Cheers,
Andre
Hey Andre,
Not sure if this is the same error ... but the traceback is similar. I'm having no problems with the initial build on most chromosomes, but seeing problems on (hg19) chr3, 5, etc ... a handful. While the successful initial build steps with "--sparse-bam" generate both a "alignments.conf_3.filt.hdf5" and a "alignments.hdf5" file, I'm seeing only the "alignments.conf_3.filt.hdf5" file for the times when the step errors out. The traceback is:
Traceback (most recent call last):
File "/home/ubuntu/SplAdder/spladder-venv/bin/spladder", line 10, in <module>
sys.exit(main())
File "/home/ubuntu/SplAdder/spladder-venv/lib/python3.5/site-packages/spladder/spladder.py", line 188, in main
options.func(options)
File "/home/ubuntu/SplAdder/spladder-venv/lib/python3.5/site-packages/spladder/spladder_build.py", line 152, in spladder
spladder_core(options)
File "/home/ubuntu/SplAdder/spladder-venv/lib/python3.5/site-packages/spladder/core/spladdercore.py", line 21, in spladder_core
genes = gen_graphs(genes, options)
File "/home/ubuntu/SplAdder/spladder-venv/lib/python3.5/site-packages/spladder/core/gen_graphs.py", line 118, in gen_graphs
genes, inserted_ = insert_intron_retentions(genes, options)
File "/home/ubuntu/SplAdder/spladder-venv/lib/python3.5/site-packages/spladder/editgraph.py", line 324, in insert_intron_retentions
exon_coverage[k] = sp.median(sp.sum(tracks[:, idx], axis=0).astype('float')) # median coverage for exon k
IndexError: index 70609 is out of bounds for axis 1 with size 16138
For the failing cases, here's the "alignments.conf_3.filt.hdf5" file listing:
$ h5ls -r ~/data/dedups/1610213826.dedup.conf_3.filt.hdf5
/ Group
/chr20_introns_m Dataset {0, 3}
/chr20_introns_p Dataset {25, 3}
/chr20_reads_col Dataset {1556008}
/chr20_reads_dat Dataset {1556008}
/chr20_reads_row Dataset {1556008}
/chr20_reads_shp Dataset {2}
... for the successful cases, here's the "alignments.conf_3.filt.hdf5" file:
$ h5ls -r ~/data/dedups/spladder.sparsebams.chr2/1610213826.dedup.conf_3.filt.hdf5
/ Group
/chr2_introns_m Dataset {0, 3}
/chr2_introns_p Dataset {72, 3}
/chr2_reads_col Dataset {4335159}
/chr2_reads_dat Dataset {4335159}
/chr2_reads_row Dataset {4335159}
/chr2_reads_shp Dataset {2}
... and the "alignments.hdf5" files:
$ h5ls -r ~/data/dedups/spladder.sparsebams.chr2/1610213826.dedup.hdf5
/ Group
/chr2_introns_m Dataset {0, 3}
/chr2_introns_p Dataset {16240, 3}
/chr2_reads_col Dataset {6306090}
/chr2_reads_dat Dataset {6306090}
/chr2_reads_row Dataset {6306090}
/chr2_reads_shp Dataset {2}
~Joe
This looks like some problem with creating the sparse bams. I am re-opening the issue to keep track. Will try to have a look soon.
Awesome- thanks Andre ... lmk if you could use any example files / hdf5 dumps, etc.
Hi @akahles,
I might have a similar error that appears unresolved in version 2.4.2
I ran all the steps as outlined in your large cohorts guide for 128 samples. Everything seems to work until the event calling step. Event calling is failing.
I noticed in my genes_graph_conf2.merge_graphs.count.hdf5
file the /segments
Dataset size format is different than what you show above - it ends with /Inf
I get similar errors for other event types or if I run without the parallel option.
Thanks for any insight you can share! Really appreciate your help.
-Dan
merge_graphs.count.hdf5 overview
h5ls -r genes_graph_conf2.merge_graphs.count.hdf5
/ Group
/edge_idx Dataset {906328}
/edges Dataset {906328, 128/Inf}
/gene_ids_edges Dataset {906328, 1}
/gene_ids_segs Dataset {1093862, 1}
/gene_names Dataset {57820, 1}
/seg_len Dataset {1093862, 1}
/seg_pos Dataset {1093862, 128/Inf}
/segments Dataset {1093862, 128/Inf}
/strains Dataset {128/Inf}
Command
spladder build \
--outdir spladder_out \
--parallel 2 \
--annotation ${annotation_spladder} \
--bams ${bamfiles} \
--verbose \
--confidence ${confidence} \
--event-types exon_skip
Traceback
Loading gene structure from spladder_out/spladder/genes_graph_conf2.merge_graphs.pickle ...
... done.
spladder_out/merge_graphs_exon_skip_C2.pickle already exists
spladder_out/merge_graphs_exon_skip_C2.pickle already exists
analyzing events with confidence 2
./usr/local/lib/python3.5/dist-packages/spladder/alt_splice/verify.py:201: RuntimeWarning: invalid value encountered in double_scalars
info[2] = sp.sum(counts_segments[seg_exon_pre] * seg_lens[seg_exon_pre]) /sp.sum(seg_lens[seg_exon_pre])
/usr/local/lib/python3.5/dist-packages/spladder/alt_splice/verify.py:203: RuntimeWarning: invalid value encountered in double_scalars
info[3] = sp.sum(counts_segments[seg_exon_aft] * seg_lens[seg_exon_aft]) /sp.sum(seg_lens[seg_exon_aft])
/usr/local/lib/python3.5/dist-packages/spladder/alt_splice/verify.py:205: RuntimeWarning: invalid value encountered in double_scalars
info[1] = sp.sum(counts_segments[seg_exon] * seg_lens[seg_exon]) /sp.sum(seg_lens[seg_exon])
Traceback (most recent call last):
File "/usr/local/bin/spladder", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.5/dist-packages/spladder/spladder.py", line 190, in main
options.func(options)
File "/usr/local/lib/python3.5/dist-packages/spladder/spladder_build.py", line 259, in spladder
analyze_events(options, options.event_types[e_idx])
File "/usr/local/lib/python3.5/dist-packages/spladder/alt_splice/analyze.py", line 104, in analyze_events
(events_all, counts) = verify_all_events(events_all, sp.arange(len(options.strains)), options.bam_fnames, event_type, options)
File "/usr/local/lib/python3.5/dist-packages/spladder/alt_splice/verify.py", line 504, in verify_all_events
ver, info = verify_exon_skip(ev[i], genes[g_idx], segments[:, s_idx].T, sp.c_[curr_edge_idx, edges[:, s_idx]], options)
File "/usr/local/lib/python3.5/dist-packages/spladder/alt_splice/verify.py", line 214, in verify_exon_skip
idx = sp.where(counts_edges[:, 0] == sp.ravel_multi_index([seg_exon_pre[-1], seg_exon[0]], segs.seg_edges.shape))[0]
IndexError: index -1 is out of bounds for axis 0 with size 0
versions
root@091238ad2e40:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial
root@091238ad2e40:~# python3 --version
Python 3.5.2
root@091238ad2e40:~# pip3 list | grep spladder
spladder 2.4.2
@tarjand I'm experiencing the same issue. Were you ever able to resolve it?
This should be resolved with the latest release. Please re-open if still an issue.
Hello Andre, I am using spladder==3.0.4, python=3.8.15 and having an issue similar to the one discussed here. All steps up to the event calling went fine. However, this step produced an error. So far, all events generated the same error.
Here is the command (exon_skip here used as an example):
spladder build -v --parallel 2 -o /path/to/hg38_gencode_v30/genome.gtf -b /path/to/spladder/alignments.txt --event-types exon_skip
Here is the content of the log file. I am including just the bottom part of it.
[################################################# ] 7912 / 7913 (100%) - took 13 sec (ETA: 0 sec)
[##################################################] 7913 / 7913 (100%) - took 13 sec (ETA: 0 sec)
[##################################################] 7914 / 7913 (100%) - took 13 sec (ETA: 0 sec)
Remove 0-length intron events
Make exon_skip events unique by event
. . . . . . . . . . 10000
. . . . . . . . . . 20000
. . . . . . . . . . 30000
. . . . . . . . . . 40000
. . . . . . . . . . 50000
. . . . . . . . . . 60000
. . . . . . . . . . 70000
. . . . . . . . . . 80000
. . . . . . . . . . 90000
. . . . . . . . . . 100000
. . . . . . . . . . 110000
. . . . . . . . . . 120000
. . . . . . . . . . 130000
. . . . . . . . . . 140000
. . . . . . . . . . 150000
. . . . . . . . . . 160000
. . . . . . . . . . 170000
. . . . . . . . . . 180000
. . . . . . . . . . 190000
. . . . . . . . . . 200000
. . . . . . . . . . 210000
events dropped: 144309
saving exon skips to /sc/arion/projects/sealfs01a/german/projects/ECHO/JH-LymeRNAseq_206_157_1_SR1/spladder/merge_graphs_exon_skip_C3.pickle
analyzing events with confidence 3
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/sc/arion/projects/sealfs01a/stas/conda/local_envs/snakemake3_mamba6a/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/sc/arion/projects/sealfs01a/stas/conda/local_envs/snakemake3_mamba6a/lib/python3.8/site-packages/spladder/alt_splice/verify.py", line 504, in verify_wrapper
segments = np.atleast_2d(IN['segments'][gr_idx_segs, :])[:, sample_idx]
IndexError: index 10 is out of bounds for axis 1 with size 10
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/sc/arion/projects/sealfs01a/stas/conda/local_envs/snakemake3_mamba6a/bin/spladder", line 8, in <module>
sys.exit(main())
File "/sc/arion/projects/sealfs01a/stas/conda/local_envs/snakemake3_mamba6a/lib/python3.8/site-packages/spladder/spladder.py", line 229, in main
options.func(options)
File "/sc/arion/projects/sealfs01a/stas/conda/local_envs/snakemake3_mamba6a/lib/python3.8/site-packages/spladder/spladder_build.py", line 163, in spladder
analyze_events(event_type, options.bam_fnames, options)
File "/sc/arion/projects/sealfs01a/stas/conda/local_envs/snakemake3_mamba6a/lib/python3.8/site-packages/spladder/alt_splice/analyze.py", line 95, in analyze_events
(events_all, counts, verified) = verify_all_events(events_all, np.arange(len(options.samples)), bam_fnames, event_type, options)
File "/sc/arion/projects/sealfs01a/stas/conda/local_envs/snakemake3_mamba6a/lib/python3.8/site-packages/spladder/alt_splice/verify.py", line 590, in verify_all_events
tmp = result.pop(0).get()
File "/sc/arion/projects/sealfs01a/stas/conda/local_envs/snakemake3_mamba6a/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
IndexError: index 10 is out of bounds for axis 1 with size 10
Also, as was requested in some of the previous posts, here is the output of the h5ls -r <output>/spladder/genes_graph_conf3.merge_graphs.count.hdf5
/edge_idx Dataset {442044}
/edges Dataset {442044, 10/Inf}
/gene_ids_edges Dataset {442044, 1}
/gene_ids_segs Dataset {691749, 1}
/gene_names Dataset {58929, 1}
/samples Dataset {10/Inf}
/seg_len Dataset {691749, 1}
/seg_pos Dataset {691749, 10/Inf}
/segments Dataset {691749, 10/Inf}
Description
Dear Andre,
Whilst following the step-wise procedure for use on large cohorts, SplAdder returns an error on the last step (4. Event Calling).
All previous steps ran just fine and prior to the final error, it creates the individual pickles for the given event-types: merge_graphs_intron_retention_C3.pickle, merge_graphs_intron_retention_C3.pickle, merge_graphs_alt_5prime_C3.pickle, etc.
Any idea what could be the issue? If I need to provide some more information, feel free to ask! Thanks!
What I Did
Traceback