mortazavilab / lapa

Alternative polyadenylation detection from diverse data sources such as 3'-seq, long-read and short-reads.
https://www.biorxiv.org/content/10.1101/2022.11.08.515683v1
22 stars 12 forks source link

ValueError: new categories must not include old categories #25

Open nemitheasura opened 4 months ago

nemitheasura commented 4 months ago

Hi, I am using lapa for the DRS and cDNA ONT data. While it runs smoothly in DRS, in case of the cDNA reads, it throws an error at the clustering stage.

I used the following command: lapa --alignment alignment.csv --fasta /references/reference/ucsc/rn7.fa --annotation /references/reference/ucsc/lapa_utrs_ncbiRefSeq.gtf --chrom_sizes /references/reference/ucsc/chrom_sizes.txt --output_dir /ANALYSES/rat/cDNA/LAPA

And here is the traceback: Traceback (most recent call last): File "/usr/local/software/lapa/eb16fee/bin/lapa", line 11, in load_entry_point('lapa==0.0.5', 'console_scripts', 'lapa')() File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/click/core.py", line 1128, in call return self.main(args, kwargs) File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/click/core.py", line 754, in invoke return __callback(args, kwargs) File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/lapa/main.py", line 122, in cli_lapa non_replicates_read_threhold=non_replicates_read_threhold) File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/lapa/lapa.py", line 497, in lapa _lapa(alignment) File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/lapa/lapa.py", line 297, in call df_cluster = self.annotate_cluster(df_cluster) File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/lapa/lapa.py", line 155, in annotate_cluster df = self.create_genomic_regions().annotate(gr) File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/lapa/genomic_regions.py", line 67, in annotate gr_gtf, strandedness='same', how='left') \ File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/pyranges.py", line 2257, in join dfs = pyrange_apply(_write_both, self, other, kwargs) File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/multithreaded.py", line 236, in pyrange_apply result = call_f(function, nparams, df, odf, kwargs) File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/multithreaded.py", line 23, in call_f return f.remote(df, odf, kwargs) File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/methods/join.py", line 129, in _write_both scdf, ocdf = _both_dfs(scdf, ocdf, how=how) File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/methods/join.py", line 83, in _both_dfs oh = null_types(ocdf.head(1)) File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/methods/join.py", line 67, in null_types tmp_cat = tmp_cat.cat.add_categories("-1") File "/usr/local/software/python/3.6.11/lib/python3.6/site-packages/pandas/core/accessor.py", line 89, in f return self._delegate_method(name, *args, *kwargs) File "/usr/local/software/python/3.6.11/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 2403, in _delegate_method res = method(args, kwargs) File "/usr/local/software/python/3.6.11/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 1023, in add_categories raise ValueError(msg.format(already_included=already_included)) ValueError: new categories must not include old categories: {'-1'}

I would be grateful for solving the issue.