mortazavilab / lapa

Alternative polyadenylation detection from diverse data sources such as 3'-seq, long-read and short-reads.
https://www.biorxiv.org/content/10.1101/2022.11.08.515683v1
22 stars 12 forks source link

error: "Only a column name can be used for the key in a dtype mappings argument" #19

Open emaxortiz opened 1 year ago

emaxortiz commented 1 year ago

Hi Muhammed, I'm trying to test lapa with RNAseq short reads. I'm using hisat2 for the mapping ( I built the hg38 with transcript index using the files suggested in the lapa tutorial). And my python version is 3.9

After fixing the gtf file and gave it the right format to all the inputs. Lapa failed after trying to process the bam for the first sample with the following error:

$ lapa --alignment samples.csv --fasta genome.fa --annotation genome_utr.gtf --chrom_sizes chrom_sizes --output_dir lapa_test Traceback (most recent call last): File "/home/eortiz/.local/bin/lapa", line 8, in sys.exit(cli_lapa()) File "/zfs/gcl/software/gbf/anaconda3/2021.11/lib/python3.9/site-packages/click/core.py", line 1128, in call return self.main(args, kwargs) File "/zfs/gcl/software/gbf/anaconda3/2021.11/lib/python3.9/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/zfs/gcl/software/gbf/anaconda3/2021.11/lib/python3.9/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, ctx.params) File "/zfs/gcl/software/gbf/anaconda3/2021.11/lib/python3.9/site-packages/click/core.py", line 754, in invoke return __callback(args, **kwargs) File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/main.py", line 112, in cli_lapa lapa(alignment, fasta, annotation, chrom_sizes, output_dir, File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/lapa.py", line 497, in lapa _lapa(alignment) File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/lapa.py", line 288, in call df_all_count, sample_counts = self.counting(alignment) File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/lapa.py", line 142, in counting df_all_count, sample_counts = counter.to_df() File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/count.py", line 583, in to_df df = pd.concat([ File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/count.py", line 584, in self.build_counter(row['path']) File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/count.py", line 142, in to_df return self.to_gr().df.astype({'Chromosome': 'str', 'Strand': 'str'}) File "/zfs/gcl/software/gbf/anaconda3/2021.11/lib/python3.9/site-packages/pandas/core/generic.py", line 5791, in astype raise KeyError( KeyError: 'Only a column name can be used for the key in a dtype mappings argument.'

I know this error is generated when the names in the columns don't match exactly, but I'm not so sure how to fix it. Any suggestion is welcome.

Thanks.