Open lucy924 opened 2 months ago
Hello @lucy924,
Could you send me the whole panic message?
Hi @ArtRand Here it is:
- performing DMR on methylated CpG pileups, single site
> reading reference FASTA at "/home/dejlu879/refs/hg38_chromsonly_ebv_lambda_bcg/chr15.fa"
> running single-site analysis
> using default prior, Beta(α: 0.55, β: 0.55)
> estimating max coverages from data
> sampled 12 a records and 1502 b records, calculating max coverages for 95th percentile
thread 'main' panicked at src/thresholds.rs:39:22:
index out of bounds: the len is 12 but the index is 12
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Hello @lucy924, thanks for sending the traceback.
I'll have a build to you with the fix asap.
Hello @lucy924,
What you're doing looks reasonable to me. With respect to this error:
> sampled 46 a records and 0 b records, calculating max coverages for 95th percentile
> Error! not enough data points (got 0) to calculate percentile
If for some reason bedMethyl records from the 'b' sample are systematically failing you cold see zero records being sampled and these errors will be in the log. Looks like you have --log-filepath "${dmr_log}"
set, could you tell me if there are error messages in there? They would look like "batch failed during sampling.." or "failed to get coverages...". You can work-around this by passing --max-coverages
option, I would pick a number that's roughly what coverage you expect. If you have less coverage than the max, that's fine. Otherwise I'm a little curious how you're getting zero records for one of the samples, is it possible that your tabix index has the contig listed, but there are in fact zero records for that contig? Maybe checking tabix -l "${test_bed}"
and see if it makes sense.
Hi @ArtRand
I went through a bunch of the logs and none of them give any actual error messages, they just stop with the last line sampled 46 a records and 0 b records, calculating max coverages for 95th percentile
.
I'm not sure if I've missed an error log output somewhere?
The output of tabix -l "${test_bed}"
is as expected, with only one contig (I've been aligning each chromosome separately for reasons), and I checked that the bed files do actually contain records, so it's not that they are empty!
I have attached a few log files if they are of any use, all of one pair of samples (B6) on chromosome 14 that had a some dmr runs work and some that didn't.
File | Status | Description |
---|---|---|
B6_pair.chr14.6mA_dmr.v1ss.bed.log | DMR failed | DMR using pileups looking for the 6mA modification |
B6_pair.chr14.trad_dmr.v1ss.bed.log | DMR successful | DMR using pileups made with the traditional preset |
B6_Control.chr14.6mA_pileup.log | Fine | Associated DMR found 0 records (see above) |
B6_Control.chr14.mCG_pileup.log | Fine | Associated DMR may have found 0 records |
B6_Test.chr14.mCG_pileup.log | Fine | Associated DMR may have found 0 records |
B6_Control.chr14.trad_pileup.log | Fine | Associated DMR was successful (see above) |
Unfortunately the mCG DMR logs were deleted somewhere along the way, but if you need them I can rerun it at some point.
Hello @lucy924,
Thanks for sending the logs over. Do other chromosomes 6mA DMR seem to work and just chr14 is having problems? One thing I noticed is that you use --motif A 0 --motif T 0
I assume to capture the complement T
? You don't actually need to do this --motif A 0
will capture (+)-strand As and (-)-strand T positions. If you short-circuit this estimation step by passing --max-coverages
to the dmr pair
for chr14 6mA, does it complete successfully?
Sorry for the somewhat cryptic asks, I have a theory for what's going on, but I need a little more information.
Thanks.
Hi @ArtRand
Thanks for the clarification on the A/T - I wasn't entirely sure from the docs so put T down just in case, figuring it wouldn't do any harm.
Other than chrY, I had the same problem with one sample, chr11, mCG, DMR
, but not with 6mA DMR. Chr14 was having problems with multiple samples on mCG, and one of those samples on 6mA.
I set --max-coverages
to 100 and reran all samples/all chroms, and chr14 didn't error, but it didn't work either:
On mCG
> finished, processed 0 sites successfully, 0 failed
6mA is behaving weirdly on all samples this time, and I don't think I've done anything different. Unfortunately I don't have time this week to delve into it more, I'll let you know if I figure out what happened.
I've attached example outputs for both mCG and 6mA, hopefully it helps ...
I managed to get good outputs using the below (both B and C sets):
modkit dmr pair \
-a "${control_bed_C3}" \
-a "${control_bed_C5}" \
-a "${control_bed_C6}" \
-b "${test_bed_C3}" \
-b "${test_bed_C5}" \
-b "${test_bed_C6}" \
-o "${dmr_result}" \
--header \
--ref "${reference}" \
--segment "${dmr_segment}" \
--base C \
--base A \
--threads 20 \
--log-filepath "${dmr_log}" \
-f \
--min-valid-coverage 20 \
--delta 0.1
And I'm running with this data for now, do you think it likely that issues happened with the same samples on chr14 and the problems are just hidden in this case?
Thanks, Lucy
Hello @lucy924,
I've made a refactor to how the tabix indices are handled in dmr
. Please find a build attached. Hopefully with this build the outputs from the various runs become consistent. Thanks,
Hi @ArtRand Thanks for that, I'll test it out when I have the chance. Much appreciated!
Hi, I've had this issue cropping up in some of my dmr's but not consistently. I'm looping through each of my paired samples and each chromosome reference (human T2T), with different pileups (one using the
-traditional
preset and another using-cpg
. The samples are of differently treated cancer cell lines that have been whole genome sequenced.experiment names:
B3, C3, B4, B5, C5, B6, C6
Problem pileup, pair, chromosome so far:
Do you think rerunning these specific pileups will help?
Pileup code
DMR code
Maybe related issue I'm not sure if this next issue related but I'm also getting a different error with other pairs as well:
This is the list of experiments with not enough datapoints
I initially didn't think this was a problem, (especially as the B experiments are a female derived cell line) but then I started getting the same error on autosomal chromosomes too, in which the alignment depth might not be great but I didn't think it was this bad.
Appreciate any insights!