vibansal / crisp

Code for multi-sample variant calling from sequence data of pooled or unpooled DNA samples
MIT License
19 stars 8 forks source link

Issues with --bed option #27

Closed Arynio closed 1 year ago

Arynio commented 2 years ago

Hi,

I'm trying to run CRISP for a subset of known polymorphic sites using the --bed option, and while the programme seems to accept all input files without issues, for some reason it's not being able to call any variant at all.

These are the first four lines of the tab-separated bed-file: 2L 73868 73869 2L 86449 86450 2L 462386 462387 2L 466451 466452

And these are the first few lines of the .log:


CRISP options: QVoffset 33 min_base_quality 13 min_mapping_score 20 max_permutations 20000 poolsize 160 CT-pvalue-thresh -3.5 QV-pvalue-thresh -5.0

reading fasta index file /share/rdata/ramon.pouso/reference/indexed_reference/dmel-all-chromosome-r6.14.fasta.fai ... fasta file /share/rdata/ramon.pouso/reference/indexed_reference/dmel-all-chromosome-r6.14.fasta has 1870 chromosomes/contigs

read 8040 intervals from bedfile /share/rdata/ramon.pouso/counts/sites_missing_from_gen140_fixed_in_gen140.autosomes.bed

processing 44 bamfiles: /share/rdata/ramon.pouso/POOLS/gen0-40/gen0-40/4separate/multi_autosomic/BT20_new_gen20_bwa_iso_rmdup_mq30_rg_multirealigned_autosomic.bam ..... /share/rdata/ramon.pouso/POOLS/gen140/gen140/5processed/autosomic_isolated/sample19_gen140_bwa_iso_rmdup_mq30_rg_multirealigned_autosomic.bam

poolsize for each sample is 160 finished reading bam file indexes reading chromosome 2L offset 189 # targeted bases on chrom is 2448/23513712 .....processed 1000000 reads QSIZE:7774 2L:39863:39964 variants called 0 .....processed 2000000 reads QSIZE:7522 2L:77681:77782 variants called 0 .....processed 3000000 reads QSIZE:7067 2L:110616:110717 variants called 0 .....processed 4000000 reads QSIZE:6988 2L:144446:144547 variants called 0 .....processed 5000000 reads QSIZE:7783 2L:178149:178250 variants called 0 .....processed 6000000 reads QSIZE:6303 2L:214691:214792 variants called 0 .....processed 7000000 reads QSIZE:7322 2L:249779:249880 variants called 0 .....processed 8000000 reads QSIZE:6634 2L:285603:285704 variants called 0 .....processed 9000000 reads QSIZE:5725 2L:320753:320854 variants called 0 .....processed 10000000 reads QSIZE:7135 2L:368829:368930 variants called 0


Why is it not working? The target sites have been confirmed to be SNPs using CRISP in a previous whole-genome run with fewer samples.

I've tried several things (such as removing the optional columns from the .bed file, using larger intervals, I've even added .bai files for all .bam files just in case), but nothing seems to help.

Thank you very much in advance.

Best,

Daniel.

vibansal commented 2 years ago

The code included a filter to not output variants in low coverage regions when using bed files. This has been disabled in the latest commit. Please let me know if the problem is not resolved.