mortazavilab / lapa

Alternative polyadenylation detection from diverse data sources such as 3'-seq, long-read and short-reads.
https://www.biorxiv.org/content/10.1101/2022.11.08.515683v1
22 stars 12 forks source link

Value error need at least one array to concatenate #16

Open Priyankanator opened 1 year ago

Priyankanator commented 1 year ago

Hi, I am running LAPA with the command - lapa --alignment /data/salomonis-archive/FASTQs/Grimes/RNA/scRNASeq/10X-Genomics/LGCHMC53-17GEX/PacbioPBMC/PacbioPBMC/outs/possorted_genome_bam.bam --fasta hg38.fa --annotation hg38.gtf --chrom_sizes hg38.chrom_sizes --output_dir pbmc_pacbio_1

I am getting this error - a3-2020/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, *ctx.params) File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/main.py", line 112, in cli_lapa lapa(alignment, fasta, annotation, chrom_sizes, output_dir, File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/lapa.py", line 497, in lapa _lapa(alignment) File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/lapa.py", line 288, in call df_all_count, sample_counts = self.counting(alignment) File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/lapa.py", line 142, in counting df_all_count, sample_counts = counter.to_df() File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/count.py", line 583, in to_df df = pd.concat([ File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/count.py", line 584, in self.build_counter(row['path']) File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/count.py", line 142, in to_df return self.to_gr().df.astype({'Chromosome': 'str', 'Strand': 'str'}) File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/count.py", line 136, in to_gr return pr.PyRanges(df).count_overlaps( File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/pyranges/pyranges.py", line 1322, in count_overlaps counts = pyrange_apply(_number_overlapping, self, other, kwargs) File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/pyranges/multithreaded.py", line 236, in pyrange_apply result = call_f(function, nparams, df, odf, kwargs) File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/pyranges/multithreaded.py", line 23, in call_f return f.remote(df, odf, *kwargs) File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/pyranges/methods/coverage.py", line 27, in _number_overlapping _self_indexes, _other_indexes = oncls.all_overlaps_both( File "ncls/src/ncls32.pyx", line 76, in ncls.src.ncls32.NCLS32.all_overlaps_both File "ncls/src/ncls32.pyx", line 122, in ncls.src.ncls32.NCLS32.all_overlaps_both File "<__array_function__ internals>", line 5, in resize File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 1417, in resize a = concatenate((a,) n_copies) File "<__array_function__ internals>", line 5, in concatenate ValueError: need at least one array to concatenate

Please guide what could be the error due to ?

Thanks

MuhammedHasan commented 1 year ago

Can you share a section of your bam file with following comment?

samtools view -q 10 ${bam_dir}/${bamfile} | head
Priyankanator commented 1 year ago

Hi, Sure - Here is the output of the command -

samtools view -q 10 possorted_genome_bam.bam | head

A00587:867:HHGNLDSX5:2:2207:32859:8484 16 chr1 10193 255 1S98M2S 0 0 AAACCTCAACCCTAACCCTAACCCTAACCATAACCCTACCCCTAAACCTAAACCCTAAACCCTAAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA F,,,F,,FF,,:FF,F,,,,F::FF:,,:,F,,:F,::,,,:::F,,F,,FF,,,:FFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:80 nM:i:8 RG:Z:PacbioPBMC:0:1:HHGNLDSX5:2 RE:A:I xf:i:0 CR:Z:CTAACCCTAACCCTAA CY:Z:FF,FFFFFFFFFFFFF CB:Z:CTAACCCCAACCCTAA-1 UR:Z:CCCTAACCCTAA UY:Z:FFF:FFFFFFFF UB:Z:CCCTAACCCTAA A00587:867:HHGNLDSX5:4:2630:1298:36213 16 chr1 14434 255 55M14842N17M29S 0 0 CCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCTCCCGGAAGCTCCCCCCCCATGTACTCTGCGTTGATACCACTG FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:61 nM:i:0 ts:i:27 RG:Z:PacbioPBMC:0:1:HHGNLDSX5:4 RE:A:I xf:i:0 CR:Z:GAGTGTTTCCATTCAT CY:Z:FFFFFFFFFF:FFFFF CB:Z:GAGTGTTTCCATTCAT-1 UR:Z:CACACTCTCGGT UY:Z:FFFFFFFFFFFF UB:Z:CACACTCTCGGT A00587:867:HHGNLDSX5:2:2658:16459:32409 16 chr1 14434 255 55M14842N17M29S 0 0 CCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCTCCCGGAAGCTCCCCCCCCATGTACTCTGCGTTGATACCACTG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:61 nM:i:0 ts:i:27 RG:Z:PacbioPBMC:0:1:HHGNLDSX5:2 RE:A:I xf:i:0 CR:Z:GAGTGTTTCCATTCAT CY:Z:FFFFFFFFFFFFFFFF CB:Z:GAGTGTTTCCATTCAT-1 UR:Z:CACACTCTCGGT UY:Z:FFFFFFFFFFFF UB:Z:CACACTCTCGGT A00587:867:HHGNLDSX5:2:1108:18647:15530 16 chr1 14435 255 54M14842N17M30S 0 0 CGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCTCCCGGAAGCTCCCCCCCATGTACTCTGCGTTGATACCACTGCT FFFFFFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:60 nM:i:0 ts:i:29 RG:Z:PacbioPBMC:0:1:HHGNLDSX5:2 RE:A:I xf:i:0 CR:Z:GAGTGTTTCCATTCAT CY:Z:FFFFF:FFFF,FFFFF CB:Z:GAGTGTTTCCATTCAT-1 UR:Z:CACACTCTCGGT UY:Z:FFFFFFFFFFFF UB:Z:CACACTCTCGGT A00587:867:HHGNLDSX5:4:2649:23213:5869 16 chr1 14436 255 53M14842N17M31S 0 0 GTTTCCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCTCCCGGAAGCTCCCCCCCCATGTACTCTGCGTTGATACCACTGCT FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:57 nM:i:1 ts:i:29 RG:Z:PacbioPBMC:0:1:HHGNLDSX5:4 RE:A:I xf:i:0 CR:Z:GAGTGTTTCCATTCAT CY:Z:FFFFFFFF:F,FFFFF CB:Z:GAGTGTTTCCATTCAT-1 UR:Z:CACACTCTCGGT UY:Z:FFFFFFFFFFFF UB:Z:CACACTCTCGGT A00587:867:HHGNLDSX5:2:1663:27697:4523 16 chr1 14533 255 55M170418N25M21S 0 0 GTGGCCTCAAGCCAGCCTTCCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCCCCCCATGGAGCACAGCCAGGACAAGCTGCTCAGACCTACT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:66 nM:i:1 RG:Z:PacbioPBMC:0:1:HHGNLDSX5:2 RE:A:I xf:i:0 CR:Z:AACAGGGTCGGAATGG CY:Z:FFFFFFFFFFFFFFFF CB:Z:AACAGGGTCGGAATGG-1 UR:Z:ACGTGTAGTCTT UY:Z:FFFFFFFFFFFF UB:Z:ACGTGTAGTCTT A00587:867:HHGNLDSX5:4:2369:26115:23985 16 chr1 14539 255 57M14735N20M24S 0 0 TCAAGCCAGCCTTCCACTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCGGAAGCTCCCGCCCCCCATGTACTCTGCGTTGATACC FFF:FFFFFFFFFFFFFFFFF:FFFFFFFFFF:FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF, NH:i:1 HI:i:1 AS:i:64 nM:i:1 ts:i:23 RG:Z:PacbioPBMC:0:1:HHGNLDSX5:4 RE:A:I xf:i:0 CR:Z:TGTCCACAGCTACTAC CY:Z:FFFFFFFFFFFFFFFF CB:Z:TGTCCACAGCTACTAC-1 UR:Z:CGGCCCGCACGG UY:Z:FFFFFFFFFFFF UB:Z:CGGCCCGCACGG A00587:867:HHGNLDSX5:1:2319:20103:14826 16 chr1 14541 255 55M14735N20M26S 0 0 AAGCCAGCCTTCCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCGGAAGCTCCCGCCCCCCATGTACTCTGCGTTGATACCAC FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:64 nM:i:0 ts:i:25 RG:Z:PacbioPBMC:0:1:HHGNLDSX5:1 RE:A:I xf:i:0 CR:Z:TAACACGAGCAATTAG CY:Z:FFFFFFFFFFFFFFFF CB:Z:TAACACGAGCAATTAG-1 UR:Z:TCGTATTTCACT UY:Z:FFFFFFFFFFFF UB:Z:TCGTATTTCACT A00587:867:HHGNLDSX5:3:1620:8467:17409 16 chr1 14541 255 55M14735N20M26S 0 0 AAGCCAGCCTTCCACTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCGGAAGCTCCCGCCCCCCATGTACTCTGCGTTGATACCAC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:62 nM:i:1 ts:i:25 RG:Z:PacbioPBMC:0:1:HHGNLDSX5:3 RE:A:I xf:i:0 CR:Z:TGTCCACAGCTACTAC CY:Z:FFFFFFFFFFFFFFFF CB:Z:TGTCCACAGCTACTAC-1 UR:Z:CGGCCCGCACGG UY:Z:FFFFFFFF:FFF UB:Z:CGGCCCGCACGG A00587:867:HHGNLDSX5:2:2618:16993:5932 16 chr1 14541 255 55M14735N20M26S 0 0 AAGCCAGCCTTCCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCGGAAGCTCCCGCCCCCCATGTACTCTGCGTTGATACCAC FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:64 nM:i:0 ts:i:25 RG:Z:PacbioPBMC:0:1:HHGNLDSX5:2 RE:A:I xf:i:0 CR:Z:GTAATGCTCAAACTGC CY:Z:FFFFFFFFFFFFFFFF CB:Z:GTAATGCTCAAACTGC-1 UR:Z:AGAATCAAAAAT UY:Z:FFFFFFFFFFFF UB:Z:AGAATCAAAAAT

MuhammedHasan commented 1 year ago

Dear @Priyankanator,

It looks like reads in your code not passing some quality filter resulting in 0 counts.

Can you further help me to understand what is going on with executing the following code:

 import pyranges as pr
 from lapa.count import ThreePrimeCounter

gr_bam = pr.read_bam(bam_file, mapq=10)
gr_bam = gr_bam[gr_bam.Flag.isin({0, 16})]
gr_bam # is this data frame empty?

 counter = ThreePrimeCounter(bam_file)
 counts = counter.count()
 counts # or is this dictionary empty?
Priyankanator commented 1 year ago

Dear @Priyankanator,

It looks like reads in your code not passing some quality filter resulting in 0 counts.

Can you further help me to understand what is going on with executing the following code:

import pyranges as pr
from lapa.count import ThreePrimeCounter

gr_bam = pr.read_bam(bam_file, mapq=10)
gr_bam = gr_bam[gr_bam.Flag.isin({0, 16})]
gr_bam # is this data frame empty?

counter = ThreePrimeCounter(bam_file)
counts = counter.count()
counts # or is this dictionary empty?

Hi Please find the results of this command below - file:///Volumes/salomonis2/LabFiles/Priyanka/LAPA/LAPA_troubleshooting.html

MuhammedHasan commented 1 year ago

Hi Please find the results of this command below - file:///Volumes/salomonis2/LabFiles/Priyanka/LAPA/LAPA_troubleshooting.html

I cannot access to this url.