waterlandlab / CluBCpG

Cluster-based analysis of CpG methylation
https://clubcpg.readthedocs.io/
MIT License
10 stars 6 forks source link

clubcpg-coverage error #11

Closed kjdudley closed 3 years ago

kjdudley commented 3 years ago

Describe the bug There appears to be an issue with the chromosome entry. I am trying to calculate coverage for an organism with a genome at scaffold level of assembly.

To Reproduce clubcpg-coverage -a P_bismark_bt2_sorted.deduplicated.bam -o ${PWD} --bin_size 100 -chr NW_018395390.1 --no_overlap False

Error message Log file: /media/kevlab/projects/helicoverpa_epigenetics/exp/wgbs/analysis/20200817/barcode_analysis/CompleteBins.P_bismark_bt2_sorted.deduplicated.bam.NW_018395390.1.log Traceback (most recent call last): File "/media/kevlab/projects/helicoverpa_epigenetics/exp/wgbs/analysis/20200817/barcode_analysis/clubcpg/bin/clubcpg-coverage", line 98, in output_file = calc.analyze_bins(chrom_of_interest) File "/media/kevlab/projects/helicoverpa_epigenetics/exp/wgbs/analysis/20200817/barcode_analysis/clubcpg/lib/python3.6/site-packages/clubcpg/CalculateBinCoverage.py", line 148, in analyze_bins new[individual_chrom] = chromosome_lengths[individual_chrom] KeyError: 'NW_018395390.1'

Expected behavior An output file containing coverage estimates across the specified bins for the chromosome/scaffold of interest.

Screenshots If applicable, add screenshots to help explain your problem.

System specs (please complete the following information):

Additional context Add any other context about the problem here.

canthonyscott commented 3 years ago

This is not so much a bug as just a feature not presently implemented. Currently the software does not operate on chromosome scaffolds with names such as "NW_018395390.1". The best course of action would be to filter those out and keep only the full chromosomes such as chr1, chr2, etc.