liampshaw / mobile-gene-regions

Analysing the genomic context of mobile genes
4 stars 0 forks source link

The step of plot_linear_blocks_altair_deduplicated failed! #16

Closed Dx-wmc closed 4 weeks ago

Dx-wmc commented 1 month ago

Hi, I'm glad we meet again. This time I had a problem with not being able to generate the corresponding html file. But strangely enough, it worked fine for me using a set of kpc's test genomes, but when using my own data, it prompted the error that Error in rule plot_linear_blocks_altair_deduplicated: jobid: 16 input: /home/data/usr/dx/workdir/test/run/new/pangraph/mgr2/exoU/pangraph/exoU_pangraph.json.blocks.csv output: /home/data/usr/dx/workdir/test/run/new/pangraph/mgr2/exoU/plots/exoU_linear_blocks_deduplicated.html shell: python scripts/plot_linear_blocks_altair.py --flanking_width 10000 --unique --block_csv /home/data/usr/dx/workdir/test/run/new/pangraph/mgr2/exoU/pangraph/exoU_pangraph.json.blocks.csv --gene_name exoU --output /home/data/usr/dx/workdir/test/run/new/pangraph/mgr2/exoU/plots/exoU_linear_blocks_deduplicated.html --gff_file input/gffs/exoU_annotations.gff (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

I ran the script alone and it is also reporting errors.

The files generated by the pipeline and my gff3 file are in the attachment for you to view. t.gff3.txt exoU_pangraph.json.blocks.csv

command: python scripts/plot_linear_blocks_altair.py --flanking_width 10000 --unique --block_csv exoU_pangraph.json.blocks.csv --gene_name exoU --output

Error:


 Traceback (most recent call last):
  File "/home/data/usr/dx/miniconda3/envs/bact_genome/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3653, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'new_start'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "scripts/plot_linear_blocks_altair.py", line 318, in <module>
    main()
  File "scripts/plot_linear_blocks_altair.py", line 201, in main
    annotation_hits['new_start'] = annotation_hits['new_start']+args.flanking_width
  File "/home/data/usr/dx/miniconda3/envs/bact_genome/lib/python3.8/site-packages/pandas/core/frame.py", line 3761, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/data/usr/dx/miniconda3/envs/bact_genome/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3655, in get_loc
    raise KeyError(key) from err
KeyError: 'new_start'`
liampshaw commented 1 month ago

I'm sorry for this error (and for my slow reply!).

Thanks for providing these files. I tried running

python scripts/plot_linear_blocks_altair.py --flanking_width 10000 --unique --block_csv exoU_pangraph.json.blocks.csv --gene_name exoU --output test.html

with your files and it worked, see here for a screenshot of the html it generated:

Screenshot 2024-06-03 at 16 18 21

From your error messages it looks like something went wrong with creating a dataframe from the gff (the `new_start' seems to have not been created). I wonder if the fact that 'exoU' occurs multiple times in the gff has something to do with it. Do you have the actual dataset so I can try to reproduce this properly? I hope I can get to the bottom of it for you.

Dx-wmc commented 4 weeks ago

I am glad to see your reply. I have found the cause of the problem. It was caused by a mismatch between the name in --gene_name and the product part of gff, as your script matches uppercase letters by default!

liampshaw commented 4 weeks ago

Thanks for investigating yourself. This is a good catch and apologies for the default!