heathsc / gemBS-rs

A re-write of the gemBS pipeline framework in Rust
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

`make_snps` parameter generates empty output #13

Open bounlu opened 2 years ago

bounlu commented 2 years ago

I have set make_snps = True at the extract step in the config file as below to get SNP output. Although it runs the process and generates the specified *_snps.txt.gz file, it is empty and there is no error code. It seems there is a bug with writing the output to the file.

base = "/home"

index_dir = ${base}/reference
sequence_dir = ${base}/fastq
bam_dir = ${base}/mapping
bcf_dir = ${base}/calling
extract_dir = ${base}/extract
report_dir = ${base}/report

# hg38 genome index prepared in advance
reference = ${base}/reference/genome.fasta
index = ${base}/reference/genome.gemBS.gem

jobs = 8
cores = 32
threads = 64
memory = 256G

[dbsnp]

# dbsnp index prepared in advance
dbsnp_files = ${base}/reference/dbsnp_146.hg38.vcf.gz
dbsnp_index = ${base}/reference/dbsnp_146.hg38.gemBS.vcf.idx
dbsnp_type = "VCF"

[mapping]

non_stranded = False
remove_individual_bams = True

[calling]

jobs = 8
mapq_threshold = 10
qual_threshold = 13
reference_bias = 2
left_trim = 5
right_trim = 0
keep_improper_pairs = False
keep_duplicates = False
haploid = False
conversion = 0.01,0.05
remove_individual_bcfs = True
contig_pool_limit = 25000000

[extract]

jobs = 8
strand_specific = True
bigwig_strand_specific = True
phred_threshold = 10
make_cpg = True
make_non_cpg = True
make_bedmethyl = True
make_bigwig = True
make_snps = True
MareikeJaniak commented 10 months ago

Hi @bounlu, Did you ever find a solution to this issue? I'm experiencing the same thing - gemBS extract seems to run without error and generates the _snps.txt.gz output file, index, and md5, but the snps file is always empty.

Thanks! Best, Mareike

bounlu commented 10 months ago

Hi Mareike,

Unfortunately not, it seems the tool is not maintained anymore either. So I switched using BISCUIT.

MareikeJaniak commented 10 months ago

Thanks for the quick reply!

heathsc commented 10 months ago

The reply is not correct. The tool is maintained but this issue was missed. The SNP extraction works correctly (we use it in house regularly) if the SNP indexes are correctly set up; if there is a bug it is probably with insufficient notification of errors in the index generation phase. Was the dbSNP index file generated?

Simon

On Tue, Oct 17, 2023 at 3:56 PM MareikeJaniak @.***> wrote:

Thanks for the quick reply!

— Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS-rs/issues/13#issuecomment-1766473555, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY465ZW3TKUQEEBPQ5CHITX72FAVAVCNFSM55R2QMBKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZWGY2DOMZVGU2Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

MareikeJaniak commented 10 months ago

Hi Simon,

Thanks for confirming that the tool is still maintained. We're currently building it into our methylation analysis pipeline, so glad to hear that! All other steps are working well, we're just running into the issue with the snp file.

The dbSNP index file is generated and is not empty (~4.2G) but the output file includes this line, which suggests that maybe something is going wrong?

        /lb/project/mugqic/analyste_dev/software/gemBS-rs/bin/dbsnp_index --loglevel info --threads 1 --output /lb/project/mugqic/projects/mjaniak/test/methylseq/alignment/index/dbSNP_gemBS.idx /cvmfs/soft.mugqic/CentOS6/genomes/species/Homo_sapiens.GRCh38/annotations/Homo_sapiens.GRCh38.dbSNP156.vcf.gz
n_snps 890444194, n_selected_snps 0

Is this expected or are 0 snps actually being selected?

Best, Mareike

heathsc commented 10 months ago

You can use the --maf-limit option to dbsnp_index to select SNPs with MAF above the threshold to be selected. Alternatively you can use the --selected option to dbsnp_index to supply a file with a list of SNPs to be selected (one name per line).

If you are invoking dbsnp_index from within gemBS-rs then I think only the --selected option is exposed as --dbsnp-selected (i.e., gemBS index --dbsnp-selected )

Another area that can cause problems is if the contig name in the reference file do not match the contig names in the genome reference (i.e., if one used chr1, chr2 and the other uses 1, 2 etc.). In this case the --dbsnp-chrom-alias option to gemBS index can be used to translate the contig names.

Simon

On Tue, Oct 17, 2023 at 4:17 PM MareikeJaniak @.***> wrote:

Hi Simon,

Thanks for confirming that the tool is still maintained. We're currently building it into our methylation analysis pipeline, so glad to hear that! All other steps are working well, we're just running into the issue with the snp file.

The dbSNP index file is generated and is not empty (~4.2G) but the output file includes this line, which suggests that maybe something is going wrong?

    /lb/project/mugqic/analyste_dev/software/gemBS-rs/bin/dbsnp_index --loglevel info --threads 1 --output /lb/project/mugqic/projects/mjaniak/test/methylseq/alignment/index/dbSNP_gemBS.idx /cvmfs/soft.mugqic/CentOS6/genomes/species/Homo_sapiens.GRCh38/annotations/Homo_sapiens.GRCh38.dbSNP156.vcf.gz

n_snps 890444194, n_selected_snps 0

Is this expected or are 0 snps actually being selected?

Best, Mareike

— Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS-rs/issues/13#issuecomment-1766515129, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY465ZK7F3NE4BETSJS673X72HQHAVCNFSM55R2QMBKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZWGY2TCNJRGI4Q . You are receiving this because you commented.Message ID: @.***>

bounlu commented 10 months ago

Hi Simon,

Good to hear that it's still maintained. I have 2 more issues pending reply, could you please help me with those as well?

I will later check your suggestions on this issue. Also it would greatly help if you can share your in-house config file.

Thanks.