Open shintarok111 opened 1 month ago
I found some of the reasons for this issue, which seems related to the DHS peak calling process.
These reads are excluded in the pre-selection process of footprint-tools. I have tried bwa version 0.7.12 and 0.7.17. In the end, I used Bowtie2 to avoid this issue.
Although footprint calling worked for many samples by applying these process, there are still samples for which I cannot make DM models.
Are there any possible reasons for this? Any help would be appreciated.
Hi,
Can you send me an example of you BAM file? I will look into it for you.
jvierstra (at) altius (dot) org
Hi Jvierstra,
Thank you very much for your reply!
I have sent an email with the attached BAM file and related files. It would be very helpful if you could check if there is anything wrong with them. I aligned the reads basically based on the ENCODE pipeline, but there would be some differences because I don't have the environment to run the pipelines.
By the way, I noticed that many of the BAM files that failed the footprint call are single-ended and have relatively low read depth. Are there any recommended cutoffs for the read depth of the DHS files?
Thank you for your help.
Hi. I have issues about learn_dispersion_model.
After running ftd learn_dm, the analysis stops due to the following error. It seems the module fails to build the DM model. The samples are ENCODE DHS data. I successfully performed the analysis on some of the input data, but the plots showing overall model parameters and fit for a predicted cleavage rate are problematic, with too few observed cleavage counts.
The input commands and error messages are as follows:
Outputs: 05-20 11:52:27 INFO footprint_tools.cli.learn_dm:Validating input files 05-20 11:52:27 INFO footprint_tools.cli.learn_dm:Loading bias model from file /.../data/vierstra_et_al.6mer-model.txt 05-20 11:52:27 INFO footprint_tools.cli.learn_dm:BED file contains 93,047 regions 05-20 11:52:27 INFO genome_tools.data.loaders:Processing data with batch_size = 100 resulting in 931 batches 05-20 11:52:27 INFO genome_tools.data.loaders:Using 'genome_tools.data.utils.numpy_collate_concat' to collate batch chunks 05-20 11:52:27 INFO genome_tools.data.loaders:Using 4 threads to process data 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 931/931 [00:44<00:00, 20.69it/s] 05-20 11:53:12 INFO footprint_tools.cli.learn_dm:Learning dispersion model 05-20 11:53:12 INFO footprint_tools.modeling.dispersion:Analyzing 13,954,276 cleavages
Traceback (most recent call last): File "/.../footprint/bin/ftd", line 8, in
sys.exit(main())
File "/.../footprint/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(args, kwargs)
File "/.../footprint/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/.../footprint/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/.../footprint/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/.../footprint/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(args, **kwargs)
File "/.../footprint/lib/python3.10/site-packages/footprint_tools/cli/learn_dm.py", line 200, in run
model = dispersion.learn_dispersion_model(hist)
File "footprint_tools/modeling/dispersion.pyx", line 462, in footprint_tools.modeling.dispersion.learn_dispersion_model
TypeError: 'NoneType' object is not iterable
Traceback (most recent call last):
File "/.../analyse_footprint.py", line 141, in
main(args.url_dict_path, args.array_job_index, args.n_threads)
File "/.../analyse_footprint.py", line 68, in main
raise ValueError('failed to run ftd learn_dm')
ValueError: failed to run ftd learn_dm
Inputs: ftd learn_dm \ --bias_model_file /.../data/vierstra_et_al.6mer-model.txt \ --n_threads 4 \ /.../ENCSR805EGZ/DHS/peaks.bed \ /.../ENCSR805EGZ/DHS/pe_filtered.bam \ /home/.../ref/hg19.fa
head peaks.bed: chr1 10175 10325 . 0 . 116 -1 -1 75 chr1 10415 10565 . 0 . 102 -1 -1 75 chr1 16175 16325 . 0 . 45 -1 -1 75 chr1 234282 234432 . 0 . 17 -1 -1 75 chr1 662540 662690 . 0 . 12 -1 -1 75 chr1 714055 714205 . 0 . 641 -1 -1 75 chr1 714460 714610 . 0 . 43.0 -1 -1 75 ...
samtools view pe_filtered.bam | head: K00314:149:H7Y5FBBXY:5:1124:9526:24507 81 chr1 9996 0 76M chr7 10150 0 CCTAACAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC JJJJFJFJJJJFFAJJJJJJJJJJJFJJJJJJJJFJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFAAA X0:i:2 X1:i:3 XA:Z:chr19,+59118914,76M,4;chr4,+191044027,22M1D54M,2;chrY,+59363343,68M2I6M,2;chrX,+155260337,68M2I6M,2; MC:Z:75M MD:Z:0T1C0G1T70 XG:i:0 AM:i:0 NM:i:4 SM:i:0 XM:i:4 XN:i:5 XO:i:0 MQ:i:37 XT:A:R K00314:149:H7Y5FBBXY:5:1125:30239:42917 121 chr1 9996 37 75M = 9996 0 TCCGATCACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC J<F7-77-JF-<--JJJFJFJF<--7JJJFA-FAAAF-F-FJF77-<JFJA7FA7-JJJF<FJJFJJJJFFFAAA X0:i:1 X1:i:0 MD:Z:6A68 XG:i:0 AM:i:0 NM:i:1 SM:i:37 XM:i:1 XN:i:5 XO:i:0 XT:A:U ...
Do you have any ideas or suggestions on what might be causing the problem? Thank you in advance for your kind help.