Closed gtollefson closed 2 years ago
@akahles I would like to follow up on my issue post above.
I noticed that I had accidentally set --set-mm-tag Nm
incorrectly and that it should have been --set-mm-tag nM
. Rerunning without --ignore-mismatches
produced more AS events reported in the build step however when I run test
I receive the error:
RuntimeWarning: invalid value encountered in double_scalars
I've pasted the complete output below and have looked inside genes_graph_conf3.merge_graphs.count.hdf5
and have pasted the output below. It looks like several counts have shape 1/1 as the second value.
genes_graph_conf3.merge_graphs.count.hdf5 contents:
findiv13vhb733:spladder George$ h5ls -v genes_graph_conf3.merge_graphs.count.hdf5 Opened "genes_graph_conf3.merge_graphs.count.hdf5" with sec2 driver. edge_idx Dataset {55/55} Location: 1:23168 Links: 1 Storage: 440 logical bytes, 440 allocated bytes, 100.00% utilization Type: native double edges Dataset {55/55, 12/12} Location: 1:21904 Links: 1 Storage: 5280 logical bytes, 5280 allocated bytes, 100.00% utilization Type: native double gene_ids_edges Dataset {55/55, 1/1} Location: 1:22176 Links: 1 Storage: 440 logical bytes, 440 allocated bytes, 100.00% utilization Type: native long gene_ids_segs Dataset {100/100, 1/1} Location: 1:21632 Links: 1 Storage: 800 logical bytes, 800 allocated bytes, 100.00% utilization Type: native long gene_names Dataset {1/1, 1/1} Location: 1:22896 Links: 1 Storage: 15 logical bytes, 15 allocated bytes, 100.00% utilization Type: 15-byte null-padded ASCII string seg_len Dataset {100/100, 1/1} Location: 1:22624 Links: 1 Storage: 800 logical bytes, 800 allocated bytes, 100.00% utilization Type: native long seg_pos Dataset {100/100, 12/12} Location: 1:1672 Links: 1 Storage: 9600 logical bytes, 9600 allocated bytes, 100.00% utilization Type: native double segments Dataset {100/100, 12/12} Location: 1:1400 Links: 1 Storage: 9600 logical bytes, 9600 allocated bytes, 100.00% utilization Type: native double strains Dataset {12/12} Location: 1:800 Links: 1 Storage: 384 logical bytes, 384 allocated bytes, 100.00% utilization Type: 32-byte null-padded ASCII string
Error:
/gtollefs/gene_x_splicing_project/spladder/venv/lib/python3.6/site-packages/statsmodels/genmod/generalized_linear_model.py:798: RuntimeWarning: invalid value encountered in double_scalars return np.sum(resid / self.family.variance(mu)) / self.df_resid Traceback (most recent call last): File "/gtollefs/gene_x_splicing_project/spladder/venv/bin/spladder", line 11, in
sys.exit(main()) File "/gtollefs/gene_x_splicing_project/spladder/venv/lib/python3.6/site-packages/spladder/spladder.py", line 190, in main options.func(options) File "/gtollefs/gene_x_splicing_project/spladder/venv/lib/python3.6/site-packages/spladder/spladder_test.py", line 777, in spladder_test (pvals, cov_used, disp_raw_used, disp_adj_used) = run_testing(cov, dmatrix0, dmatrix1, sf, options, event_type, test_idx) File "/gtollefs/gene_x_splicing_project/spladder/venv/lib/python3.6/site-packages/spladder/spladder_test.py", line 526, in run_testing (disp_fitted, Lambda, disp_idx) = fit_dispersion(cov, disp_raw, (disp_raw_conv[:, 0] & test_idx)[:, np.newaxis], sf, options, dmatrix1, event_type) File "/gtollefs/gene_x_splicing_project/spladder/venv/lib/python3.6/site-packages/spladder/spladder_test.py", line 245, in fit_dispersion res = modGamma.fit() File "/gtollefs/gene_x_splicing_project/spladder/venv/lib/python3.6/site-packages/statsmodels/genmod/generalized_linear_model.py", line 1065, in fit cov_kwds=cov_kwds, use_t=use_t, **kwargs) File "/gtollefs/gene_x_splicing_project/spladder/venv/lib/python3.6/site-packages/statsmodels/genmod/generalized_linear_model.py", line 1179, in _fit_irls raise ValueError("The first guess on the deviance function " ValueError: The first guess on the deviance function returned a nan. This could be a boundary problem and should be reported.
Dear @gtollefson ,
sorry for the late reply. Could you please run spladder test
in verbose mode (-v
) and post the log up to the error message?
Thannks,
Andre
Dear @gtollefson ,
I am closing this for now. Please re-open, if the issue still persists and you can provide further information.
Best,
Andre
Description
I've run the build and test commands for wildtype and mutant samples and receive an error during the test command with several Runtime warnings resulting in ValueError: zero-size array to reduction operation maximum which has no identity.
I've run the build steps for 3 wildtype and 3 mutant samples in a single command without error using a subset gtf file containing annotations for a single gene. (Side note: I also tried this workflow by running the build step separately for the wildtype and mutant alignment files and received the same end result) My command is as follows:
spladder build -o data/gtollefs/gene_x_splicing_project/spladder/builds -b data/gtollefs/tpp1_splicing_project/star_alignment/SA23_analysis/H9-1_Aligned.out.sorted.bam,data/gtollefs/gene_x_splicing_project/star_alignment/SA23_analysis/H9-2_Aligned.out.sorted.bam,data/gtollefs/gene_x_splicing_project/star_alignment/SA23_analysis/H9-3_Aligned.out.sorted.bam,data/gtollefs/gene_x_splicing_project/star_alignment/SA23_analysis/SA23-1_Aligned.out.sorted.bam,data/gtollefs/gene_x_splicing_project/star_alignment/SA23_analysis/SA23-2_Aligned.out.sorted.bam,data/gtollefs/gene_x_splicing_project/star_alignment/SA23_analysis/SA23-3_Aligned.out.sorted.bam -a data/gtollefs/genomes/hg38/star_reference/hg38.gene_x.refGene.gtf --set-mm-tag Nm --ignore-mismatches
I used the following custom parameters: I used the
--set-mm-tag Nm
option since my STAR aligned bam file contains tags in theNm:i:1
format. Despite this, the build command output still gave a warning and suggested using the --ignore-mismatches flag, which I added. I examined the bam file by eye and did not observe any lines missing the Nm tag and hope the reads with the tag aren't being ignored for some reason.Once I ran the build command, I ran the test command as follows:
spladder test --conditionA data/gtollefs/gene_x_splicing_project/star_alignment/SA23_analysis/H9-1_Aligned.out.sorted.bam,data/gtollefs/gene_x_splicing_project/star_alignment/SA23_analysis/H9-2_Aligned.out.sorted.bam,data/gtollefs/gene_x_splicing_project/star_alignment/SA23_analysis/H9-3_Aligned.out.sorted.bam --conditionB data/gtollefs/gene_x_splicing_project/star_alignment/SA23_analysis/SA23-1_Aligned.out.sorted.bam,data/gtollefs/gene_x_splicing_project/star_alignment/SA23_analysis/SA23-2_Aligned.out.sorted.bam,data/gtollefs/gene_x_splicing_project/star_alignment/SA23_analysis/SA23-3_Aligned.out.sorted.bam -a data/gtollefs/genomes/hg38/star_reference/hg38.gene_x.refGene.gtf --outdir data/gtollefs/gene_x_splicing_project/spladder/builds
The error message I received was:
Can you help me to troubleshoot? Thank you in advance.
In case it's useful, I've pasted the entire output of the build command below: