mortazavilab / lapa

Alternative polyadenylation detection from diverse data sources such as 3'-seq, long-read and short-reads.
https://www.biorxiv.org/content/10.1101/2022.11.08.515683v1
23 stars 13 forks source link

latest version issue: RuntimeError: The entries you tried to add are out of order, precede already added entries, or otherwise use illegal values. Please correct this and try again. #11

Closed peterthorpe5 closed 2 years ago

peterthorpe5 commented 2 years ago

Hi Muhammed,

Well done with the documentation updates! This is great. I have upgraded to the latest, as suggested. However, I have come across an issue: (full error at the bottom)

lapa command: lapa --alignment samples.csv --fasta GRCh38.primary_assembly.genome.fa --annotation hg39.utr_fixed.gtf --chrom_sizes chrom_sizes --output_dir lapa_c_vs_t

(these are the same input files which worked with the previous version, except I fixed the UTR, which was in the docs:

gencode_utr_fix --input_gtf mm10.gtf --output_gtf mm10.utr_fixed.gtf

wget -O - https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_40/gencode.v40.annotation.gtf.gz | gunzip -c > hg38.gtf

gencode_utr_fix --input_gtf hg38.gtf --output_gtf hg39.utr_fixed.gtf

gencode_utr_fix --input_gtf gencode.v39.primary_assembly.annotation.gtf --output_gtf hg39.utr_fixed.gtf

Both of these fail in with the main lapa command

..... ..... [E::idx_find_and_load] Could not retrieve index file for '/home/pthorpe/scratch/mustafa/lapa/reads_bams/R6_Trt_LONG.fastq.gz.temp.mapped.bam' Traceback (most recent call last): File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/bin/lapa", line 8, in sys.exit(cli_lapa()) File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/click/core.py", line 1130, in call return self.main(args, kwargs) File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/click/core.py", line 760, in invoke return __callback(args, **kwargs) File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/main.py", line 112, in cli_lapa lapa(alignment, fasta, annotation, chrom_sizes, output_dir, File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/lapa.py", line 497, in lapa _lapa(alignment) File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/lapa.py", line 288, in call df_all_count, sample_counts = self.counting(alignment) File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/lapa.py", line 143, in counting counter._to_bigwig(df_all_count, sample_counts, self.chrom_sizes, File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/count.py", line 561, in _to_bigwig save_count_bw(df_all, output_dir, chromsizes, f'all{prefix}') File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/count.py", line 197, in save_count_bw BaseCounter._to_bigwig(df, chrom_sizes, output_dir, prefix) File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/count.py", line 153, in _to_bigwig bw_from_pyranges( File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/utils/io.py", line 153, in bw_from_pyranges gr['-'].to_bigwig(bw_neg_file, chromosome_sizes=chrom_sizes, File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/pyranges/pyranges.py", line 5339, in to_bigwig result = _to_bigwig(self, path, chromosome_sizes, rpm, divide, value_col, dryrun) File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/pyranges/out.py", line 203, in _to_bigwig bw.addEntries(chromosomes, starts, ends=ends, values=values) RuntimeError: The entries you tried to add are out of order, precede already added entries, or otherwise use illegal values. Please correct this and try again.

Would you be able to help?

regards,

Pete

MuhammedHasan commented 2 years ago

Hi Pete, I am trying to reproduce the error. What is your python version, and can you share the content of the chrom_sizes file?

peterthorpe5 commented 2 years ago

HI Muhammed, I have a theory that it was because the bams were not sorted. I am testing this now.

Back in touch toon.

MuhammedHasan commented 2 years ago

I think the issue occurs when the names of all chromosomes in bam and chrom_sizes files are not exactly the same.

The line below filters read counts not annotated in chrom_sizes file while saving counts as bigwig file. But filtering pyranges object based on chromosome and saving it as bigwig leads to the error you just reported.

https://github.com/mortazavilab/lapa/blob/45ac11babb05c3de4af0ff9cec9d777dd4242bdb/lapa/utils/io.py#L145

Obviously, it is a bug caused by an edge case.

Thanks for reporting.

peterthorpe5 commented 2 years ago

Thanks you for helping with this. For reference python3.8. I will test the update now.

peterthorpe5 commented 2 years ago

SUCCESS!! Thank you very much! The fix on the branch worked.