vierstralab / footprint-tools

A toolset to analyze genomic footprinting data
GNU General Public License v3.0
19 stars 3 forks source link

learn_dispersion_model returns NoneType object #24

Open shintarok111 opened 1 month ago

shintarok111 commented 1 month ago

Hi. I have issues about learn_dispersion_model.

After running ftd learn_dm, the analysis stops due to the following error. It seems the module fails to build the DM model. The samples are ENCODE DHS data. I successfully performed the analysis on some of the input data, but the plots showing overall model parameters and fit for a predicted cleavage rate are problematic, with too few observed cleavage counts.

The input commands and error messages are as follows:

Outputs: 05-20 11:52:27 INFO footprint_tools.cli.learn_dm:Validating input files 05-20 11:52:27 INFO footprint_tools.cli.learn_dm:Loading bias model from file /.../data/vierstra_et_al.6mer-model.txt 05-20 11:52:27 INFO footprint_tools.cli.learn_dm:BED file contains 93,047 regions 05-20 11:52:27 INFO genome_tools.data.loaders:Processing data with batch_size = 100 resulting in 931 batches 05-20 11:52:27 INFO genome_tools.data.loaders:Using 'genome_tools.data.utils.numpy_collate_concat' to collate batch chunks 05-20 11:52:27 INFO genome_tools.data.loaders:Using 4 threads to process data 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 931/931 [00:44<00:00, 20.69it/s] 05-20 11:53:12 INFO footprint_tools.cli.learn_dm:Learning dispersion model 05-20 11:53:12 INFO footprint_tools.modeling.dispersion:Analyzing 13,954,276 cleavages

Traceback (most recent call last): File "/.../footprint/bin/ftd", line 8, in sys.exit(main()) File "/.../footprint/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/.../footprint/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/.../footprint/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/.../footprint/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/.../footprint/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, **kwargs) File "/.../footprint/lib/python3.10/site-packages/footprint_tools/cli/learn_dm.py", line 200, in run model = dispersion.learn_dispersion_model(hist) File "footprint_tools/modeling/dispersion.pyx", line 462, in footprint_tools.modeling.dispersion.learn_dispersion_model TypeError: 'NoneType' object is not iterable Traceback (most recent call last): File "/.../analyse_footprint.py", line 141, in main(args.url_dict_path, args.array_job_index, args.n_threads) File "/.../analyse_footprint.py", line 68, in main raise ValueError('failed to run ftd learn_dm') ValueError: failed to run ftd learn_dm

Inputs: ftd learn_dm \ --bias_model_file /.../data/vierstra_et_al.6mer-model.txt \ --n_threads 4 \ /.../ENCSR805EGZ/DHS/peaks.bed \ /.../ENCSR805EGZ/DHS/pe_filtered.bam \ /home/.../ref/hg19.fa

head peaks.bed: chr1 10175 10325 . 0 . 116 -1 -1 75 chr1 10415 10565 . 0 . 102 -1 -1 75 chr1 16175 16325 . 0 . 45 -1 -1 75 chr1 234282 234432 . 0 . 17 -1 -1 75 chr1 662540 662690 . 0 . 12 -1 -1 75 chr1 714055 714205 . 0 . 641 -1 -1 75 chr1 714460 714610 . 0 . 43.0 -1 -1 75 ...

samtools view pe_filtered.bam | head: K00314:149:H7Y5FBBXY:5:1124:9526:24507 81 chr1 9996 0 76M chr7 10150 0 CCTAACAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC JJJJFJFJJJJFFAJJJJJJJJJJJFJJJJJJJJFJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFAAA X0:i:2 X1:i:3 XA:Z:chr19,+59118914,76M,4;chr4,+191044027,22M1D54M,2;chrY,+59363343,68M2I6M,2;chrX,+155260337,68M2I6M,2; MC:Z:75M MD:Z:0T1C0G1T70 XG:i:0 AM:i:0 NM:i:4 SM:i:0 XM:i:4 XN:i:5 XO:i:0 MQ:i:37 XT:A:R K00314:149:H7Y5FBBXY:5:1125:30239:42917 121 chr1 9996 37 75M = 9996 0 TCCGATCACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC J<F7-77-JF-<--JJJFJFJF<--7JJJFA-FAAAF-F-FJF77-<JFJA7FA7-JJJF<FJJFJJJJFFFAAA X0:i:1 X1:i:0 MD:Z:6A68 XG:i:0 AM:i:0 NM:i:1 SM:i:37 XM:i:1 XN:i:5 XO:i:0 XT:A:U ...

Do you have any ideas or suggestions on what might be causing the problem? Thank you in advance for your kind help.

shintarok111 commented 1 month ago

I found some of the reasons for this issue, which seems related to the DHS peak calling process.

  1. When using BWA for peak calling, identical paired-end reads mapped to exactly the same sequence were flagged as not properly matched reads.

These reads are excluded in the pre-selection process of footprint-tools. I have tried bwa version 0.7.12 and 0.7.17. In the end, I used Bowtie2 to avoid this issue.

  1. Footprint-tools will not work if we filter BAM reads using the 1024 (PCR or optical duplicate) flag.

Although footprint calling worked for many samples by applying these process, there are still samples for which I cannot make DM models.

Are there any possible reasons for this? Any help would be appreciated.

jvierstra commented 3 weeks ago

Hi,

Can you send me an example of you BAM file? I will look into it for you.

jvierstra (at) altius (dot) org

shintarok111 commented 2 weeks ago

Hi Jvierstra,

Thank you very much for your reply!

I have sent an email with the attached BAM file and related files. It would be very helpful if you could check if there is anything wrong with them. I aligned the reads basically based on the ENCODE pipeline, but there would be some differences because I don't have the environment to run the pipelines.

By the way, I noticed that many of the BAM files that failed the footprint call are single-ended and have relatively low read depth. Are there any recommended cutoffs for the read depth of the DHS files?

Thank you for your help.