hasindu2008 / f5c

Ultra-fast methylation calling and event alignment tool for nanopore sequencing data (supports CUDA acceleration)
https://hasindu2008.github.io/f5c/docs/overview
MIT License
144 stars 26 forks source link

no alignment in most reads #124

Closed Arkadiy-Garber closed 1 year ago

Arkadiy-Garber commented 1 year ago

I was having this issue with nanopolish and was pointed to the fact that it does not support r10 flow cells. I then tried f5c. The output here is a little more verbose, but I am still ending up with the same error:

(base) MAB@Axceleron-WKS:~/2748_NP_methylation/fast5_pass$ f5c index -d /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12 /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12.guppy2/pass/fast5s.al.guppy.fastq
[parse_index_options::INFO] Consider using --slow5 option for fast indexing, methylation calling and eventalignment. See f5c section under https://hasindu2008.github.io/slow5tools/workflows.html for an example.
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/26
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/14
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/9
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/3
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/4
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/19
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/13
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/21
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/25
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/17
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/0
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/7
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/10
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/22
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/12
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/18
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/20
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/5
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/2
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/8
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/27
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/15
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/11
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/23
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/6
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/1
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/24
[readdb] indexing /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12/16
[readdb] num reads: 110340, num reads with path to fast5: 110340
[main] Version: 1.1
[main] CMD: f5c index -d /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12 /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12.guppy2/pass/fast5s.al.guppy.fastq
[main] Real time: 65.252 sec; CPU time: 170.488 sec; Peak RAM: 0.039 GB

(base) MAB@Axceleron-WKS:~/2748_NP_methylation/fast5_pass$ f5c call-methylation -r /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12.guppy2/pass/fast5s.al.guppy.fastq -b /data/MAB/2748_NP_methylation/fastq_pass/barcode12/assembly/assembly.sorted.bam -g /data/MAB/2748_NP_methylation/fastq_pass/barcode12/assembly/assembly.fasta -t 16
[meth_main::INFO] Default methylation tsv output format is changed from f5c v0.7 onwards to match latest nanopolish output. Set --meth-out-version=1 to fall back to the old format.
chromosome  strand  start   end read_name   log_lik_ratio   log_lik_methylated  log_lik_unmethylated    num_calling_strands num_motifs  sequence
[meth_main::1.481*1.01] 512 Entries (4.6M bases) loaded
[meth_main::2.330*6.74] 512 Entries (3.6M bases) loaded
[pthread_processor::4.100*9.91] 512 Entries (4.6M bases) processed
[meth_main::4.869*10.98] 512 Entries (3.7M bases) loaded
[pthread_processor::6.029*11.54] 512 Entries (3.6M bases) processed
[meth_main::6.785*12.14] 512 Entries (3.7M bases) loaded
[pthread_processor::7.983*12.08] 512 Entries (3.7M bases) processed
[meth_main::8.701*12.48] 512 Entries (3.5M bases) loaded
[pthread_processor::9.756*12.64] 512 Entries (3.7M bases) processed
[meth_main::10.513*12.95] 512 Entries (3.8M bases) loaded
[pthread_processor::11.372*12.96] 512 Entries (3.5M bases) processed
[meth_main::12.248*13.23] 512 Entries (4.0M bases) loaded
[pthread_processor::13.327*13.36] 512 Entries (3.8M bases) processed
[meth_main::14.399*13.62] 512 Entries (3.7M bases) loaded
[pthread_processor::15.588*13.47] 512 Entries (4.0M bases) processed
[meth_main::16.605*13.69] 512 Entries (3.7M bases) loaded
[pthread_processor::17.493*13.66] 512 Entries (3.7M bases) processed
[meth_main::18.597*13.84] 512 Entries (3.9M bases) loaded
[pthread_processor::19.537*13.80] 512 Entries (3.7M bases) processed
[meth_main::20.533*13.95] 512 Entries (3.5M bases) loaded
[pthread_processor::21.773*13.85] 512 Entries (3.9M bases) processed
[meth_main::22.655*13.97] 512 Entries (3.9M bases) loaded
[pthread_processor::23.654*13.92] 512 Entries (3.5M bases) processed
[meth_main::24.739*14.05] 512 Entries (3.8M bases) loaded
[pthread_processor::26.242*13.74] 512 Entries (3.9M bases) processed
[meth_main::27.276*13.87] 512 Entries (4.0M bases) loaded
[pthread_processor::28.338*13.80] 512 Entries (3.8M bases) processed
[meth_main::29.404*13.91] 512 Entries (4.0M bases) loaded
[pthread_processor::30.707*13.80] 512 Entries (4.0M bases) processed
[meth_main::31.776*13.91] 512 Entries (4.4M bases) loaded
[pthread_processor::32.923*13.90] 512 Entries (4.0M bases) processed
[meth_main::33.875*13.98] 512 Entries (3.8M bases) loaded
[pthread_processor::35.374*13.95] 512 Entries (4.4M bases) processed
[meth_main::36.220*14.02] 512 Entries (3.5M bases) loaded
[pthread_processor::37.325*14.02] 512 Entries (3.8M bases) processed
[meth_main::38.227*14.08] 512 Entries (3.7M bases) loaded
[pthread_processor::39.241*14.04] 512 Entries (3.5M bases) processed
[meth_main::40.072*14.09] 512 Entries (3.3M bases) loaded
[pthread_processor::41.224*14.06] 512 Entries (3.7M bases) processed
[meth_main::42.093*14.12] 512 Entries (3.9M bases) loaded
[pthread_processor::42.887*14.13] 512 Entries (3.3M bases) processed
[meth_main::43.932*14.19] 512 Entries (3.6M bases) loaded
[pthread_processor::45.034*14.13] 512 Entries (3.9M bases) processed
[meth_main::46.035*14.19] 512 Entries (3.7M bases) loaded
[pthread_processor::46.797*14.21] 512 Entries (3.6M bases) processed
[meth_main::47.810*14.26] 512 Entries (3.5M bases) loaded
[pthread_processor::48.715*14.25] 512 Entries (3.7M bases) processed
[meth_main::49.809*14.31] 512 Entries (3.8M bases) loaded
[pthread_processor::50.563*14.27] 512 Entries (3.5M bases) processed
[meth_main::51.658*14.33] 512 Entries (4.2M bases) loaded
[pthread_processor::52.600*14.28] 512 Entries (3.8M bases) processed
[meth_main::53.737*14.34] 512 Entries (4.3M bases) loaded
[pthread_processor::54.831*14.30] 512 Entries (4.2M bases) processed
[meth_main::55.704*14.34] 512 Entries (3.9M bases) loaded
[pthread_processor::57.015*14.32] 512 Entries (4.3M bases) processed
[meth_main::58.067*14.37] 512 Entries (3.8M bases) loaded
[pthread_processor::59.070*14.35] 512 Entries (3.9M bases) processed
[meth_main::60.127*14.39] 512 Entries (3.9M bases) loaded
[pthread_processor::61.029*14.37] 512 Entries (3.8M bases) processed
[meth_main::62.032*14.41] 512 Entries (3.6M bases) loaded
[pthread_processor::63.748*14.23] 512 Entries (3.9M bases) processed
[meth_main::64.855*14.28] 512 Entries (3.9M bases) loaded
[pthread_processor::65.619*14.26] 512 Entries (3.6M bases) processed
[meth_main::66.472*14.29] 512 Entries (3.8M bases) loaded
[pthread_processor::67.863*14.23] 512 Entries (3.9M bases) processed
[meth_main::68.720*14.26] 512 Entries (3.7M bases) loaded
[pthread_processor::70.104*14.20] 512 Entries (3.8M bases) processed
[meth_main::70.991*14.23] 512 Entries (4.0M bases) loaded
[pthread_processor::72.461*14.13] 512 Entries (3.7M bases) processed
[meth_main::73.305*14.17] 512 Entries (3.7M bases) loaded
[pthread_processor::74.866*14.11] 512 Entries (4.0M bases) processed
[meth_main::75.743*14.14] 512 Entries (3.7M bases) loaded
[pthread_processor::76.682*14.14] 512 Entries (3.7M bases) processed
[meth_main::77.464*14.16] 512 Entries (3.3M bases) loaded
[pthread_processor::78.616*14.13] 512 Entries (3.7M bases) processed
[meth_main::79.471*14.16] 512 Entries (3.8M bases) loaded
[pthread_processor::80.421*14.13] 512 Entries (3.3M bases) processed
[meth_main::81.270*14.16] 512 Entries (3.8M bases) loaded
[pthread_processor::82.710*14.09] 512 Entries (3.8M bases) processed
[meth_main::83.555*14.12] 512 Entries (3.6M bases) loaded
[pthread_processor::84.625*14.11] 512 Entries (3.8M bases) processed
[meth_main::85.444*14.14] 512 Entries (3.7M bases) loaded
[pthread_processor::86.285*14.15] 512 Entries (3.6M bases) processed
[meth_main::87.120*14.17] 512 Entries (3.8M bases) loaded
[pthread_processor::88.113*14.17] 512 Entries (3.7M bases) processed
[meth_main::89.015*14.20] 512 Entries (4.1M bases) loaded
[pthread_processor::90.027*14.20] 512 Entries (3.8M bases) processed
[meth_main::90.905*14.23] 512 Entries (3.8M bases) loaded
[pthread_processor::92.491*14.16] 512 Entries (4.1M bases) processed
[meth_main::93.371*14.19] 512 Entries (3.3M bases) loaded
[pthread_processor::94.544*14.17] 512 Entries (3.8M bases) processed
[meth_main::95.430*14.19] 512 Entries (3.5M bases) loaded
[pthread_processor::96.199*14.19] 512 Entries (3.3M bases) processed
[meth_main::97.081*14.21] 512 Entries (3.7M bases) loaded
[pthread_processor::98.098*14.20] 512 Entries (3.5M bases) processed
[meth_main::99.016*14.23] 512 Entries (4.1M bases) loaded
[pthread_processor::100.124*14.20] 512 Entries (3.7M bases) processed
[meth_main::101.010*14.23] 512 Entries (4.2M bases) loaded
[pthread_processor::102.146*14.23] 512 Entries (4.1M bases) processed
[meth_main::103.231*14.25] 512 Entries (3.8M bases) loaded
[pthread_processor::104.368*14.23] 512 Entries (4.2M bases) processed
[meth_main::105.416*14.26] 512 Entries (3.9M bases) loaded
[pthread_processor::106.486*14.23] 512 Entries (3.8M bases) processed
[meth_main::107.532*14.25] 512 Entries (3.7M bases) loaded
[pthread_processor::108.394*14.25] 512 Entries (3.9M bases) processed
[meth_main::109.452*14.28] 512 Entries (4.0M bases) loaded
[pthread_processor::110.277*14.26] 512 Entries (3.7M bases) processed
[meth_main::111.305*14.28] 512 Entries (3.6M bases) loaded
[pthread_processor::112.295*14.27] 512 Entries (4.0M bases) processed
[meth_main::113.396*14.30] 512 Entries (4.0M bases) loaded
[pthread_processor::114.185*14.28] 512 Entries (3.6M bases) processed
[meth_main::115.108*14.30] 512 Entries (4.2M bases) loaded
[pthread_processor::116.197*14.30] 512 Entries (4.0M bases) processed
[meth_main::117.148*14.32] 512 Entries (4.3M bases) loaded
[pthread_processor::118.435*14.30] 512 Entries (4.2M bases) processed
[meth_main::119.447*14.33] 512 Entries (3.7M bases) loaded
[pthread_processor::120.553*14.33] 512 Entries (4.3M bases) processed
[meth_main::121.657*14.35] 512 Entries (3.8M bases) loaded
[pthread_processor::122.674*14.32] 512 Entries (3.7M bases) processed
[meth_main::123.687*14.34] 512 Entries (3.7M bases) loaded
[pthread_processor::124.749*14.31] 512 Entries (3.8M bases) processed
[meth_main::125.847*14.34] 512 Entries (4.0M bases) loaded
[pthread_processor::126.629*14.32] 512 Entries (3.7M bases) processed
[meth_main::127.641*14.35] 512 Entries (3.6M bases) loaded
[pthread_processor::128.697*14.33] 512 Entries (4.0M bases) processed
[meth_main::129.536*14.35] 512 Entries (3.9M bases) loaded
[pthread_processor::130.811*14.31] 512 Entries (3.6M bases) processed
[meth_main::131.861*14.34] 512 Entries (3.5M bases) loaded
[pthread_processor::132.762*14.33] 512 Entries (3.9M bases) processed
[meth_main::133.889*14.35] 512 Entries (4.1M bases) loaded
[pthread_processor::134.849*14.32] 512 Entries (3.5M bases) processed
[meth_main::135.944*14.34] 512 Entries (4.1M bases) loaded
[pthread_processor::136.994*14.33] 512 Entries (4.1M bases) processed
[meth_main::138.021*14.35] 512 Entries (3.8M bases) loaded
[pthread_processor::138.962*14.35] 512 Entries (4.1M bases) processed
[meth_main::140.055*14.37] 512 Entries (3.9M bases) loaded
[pthread_processor::140.822*14.36] 512 Entries (3.8M bases) processed
[meth_main::141.870*14.38] 512 Entries (3.8M bases) loaded
[pthread_processor::143.064*14.35] 512 Entries (3.9M bases) processed
[meth_main::144.172*14.37] 512 Entries (4.5M bases) loaded
[pthread_processor::145.169*14.35] 512 Entries (3.8M bases) processed
[meth_main::146.210*14.37] 512 Entries (3.9M bases) loaded
[pthread_processor::147.513*14.36] 512 Entries (4.5M bases) processed
[meth_main::148.590*14.38] 512 Entries (3.8M bases) loaded
[pthread_processor::149.577*14.36] 512 Entries (3.9M bases) processed
[meth_main::150.686*14.38] 512 Entries (3.7M bases) loaded
[pthread_processor::151.462*14.38] 512 Entries (3.8M bases) processed
[meth_main::152.481*14.40] 512 Entries (3.6M bases) loaded
[pthread_processor::154.019*14.34] 512 Entries (3.7M bases) processed
[meth_main::155.104*14.35] 512 Entries (3.6M bases) loaded
[pthread_processor::155.928*14.34] 512 Entries (3.6M bases) processed
[meth_main::157.067*14.36] 512 Entries (4.2M bases) loaded
[pthread_processor::157.805*14.35] 512 Entries (3.6M bases) processed
[meth_main::158.801*14.36] 512 Entries (4.0M bases) loaded
[pthread_processor::159.932*14.36] 512 Entries (4.2M bases) processed
[meth_main::160.786*14.37] 512 Entries (3.7M bases) loaded
[pthread_processor::162.136*14.35] 512 Entries (4.0M bases) processed
[meth_main::163.025*14.37] 512 Entries (4.0M bases) loaded
[pthread_processor::164.329*14.35] 512 Entries (3.7M bases) processed
[meth_main::165.305*14.36] 512 Entries (3.8M bases) loaded
[pthread_processor::166.408*14.35] 512 Entries (4.0M bases) processed
[meth_main::167.275*14.37] 512 Entries (3.7M bases) loaded
[pthread_processor::168.422*14.36] 512 Entries (3.8M bases) processed
[meth_main::169.284*14.37] 512 Entries (3.8M bases) loaded
[pthread_processor::170.326*14.36] 512 Entries (3.7M bases) processed
[meth_main::171.249*14.38] 512 Entries (4.0M bases) loaded
[pthread_processor::172.718*14.34] 512 Entries (3.8M bases) processed
[meth_main::173.581*14.35] 512 Entries (3.8M bases) loaded
[pthread_processor::174.816*14.34] 512 Entries (4.0M bases) processed
[meth_main::175.716*14.35] 512 Entries (4.1M bases) loaded
[pthread_processor::177.072*14.32] 512 Entries (3.8M bases) processed
[meth_main::177.974*14.34] 512 Entries (4.0M bases) loaded
[pthread_processor::179.347*14.32] 512 Entries (4.1M bases) processed
[meth_main::180.197*14.33] 512 Entries (3.8M bases) loaded
[pthread_processor::181.528*14.32] 512 Entries (4.0M bases) processed
[meth_main::182.447*14.33] 512 Entries (4.0M bases) loaded
[pthread_processor::183.541*14.33] 512 Entries (3.8M bases) processed
[meth_main::184.443*14.34] 512 Entries (3.9M bases) loaded
[pthread_processor::185.693*14.32] 512 Entries (4.0M bases) processed
[meth_main::186.566*14.34] 512 Entries (3.6M bases) loaded
[pthread_processor::187.744*14.33] 512 Entries (3.9M bases) processed
[meth_main::188.780*14.34] 512 Entries (3.7M bases) loaded
[pthread_processor::189.596*14.33] 512 Entries (3.6M bases) processed
[meth_main::190.514*14.35] 512 Entries (3.9M bases) loaded
[pthread_processor::191.579*14.33] 512 Entries (3.7M bases) processed
[meth_main::192.655*14.35] 512 Entries (3.8M bases) loaded
[pthread_processor::193.737*14.33] 512 Entries (3.9M bases) processed
[meth_main::194.825*14.35] 512 Entries (3.8M bases) loaded
[pthread_processor::195.675*14.34] 512 Entries (3.8M bases) processed
[meth_main::196.738*14.35] 512 Entries (3.9M bases) loaded
[pthread_processor::197.647*14.35] 512 Entries (3.8M bases) processed
[meth_main::198.717*14.36] 512 Entries (3.8M bases) loaded
[pthread_processor::199.667*14.35] 512 Entries (3.9M bases) processed
[meth_main::200.769*14.36] 512 Entries (4.0M bases) loaded
[pthread_processor::201.571*14.36] 512 Entries (3.8M bases) processed
[meth_main::202.587*14.37] 512 Entries (3.6M bases) loaded
[pthread_processor::203.649*14.37] 512 Entries (4.0M bases) processed
[meth_main::204.508*14.38] 512 Entries (3.6M bases) loaded
[pthread_processor::205.528*14.37] 512 Entries (3.6M bases) processed
[meth_main::206.348*14.38] 512 Entries (3.5M bases) loaded
[pthread_processor::207.362*14.38] 512 Entries (3.6M bases) processed
[meth_main::208.249*14.39] 512 Entries (4.0M bases) loaded
[pthread_processor::209.302*14.38] 512 Entries (3.5M bases) processed
[meth_main::210.126*14.39] 512 Entries (3.5M bases) loaded
[pthread_processor::211.378*14.39] 512 Entries (4.0M bases) processed
[meth_main::212.281*14.40] 512 Entries (4.0M bases) loaded
[pthread_processor::213.255*14.39] 512 Entries (3.5M bases) processed
[meth_main::214.063*14.40] 512 Entries (3.3M bases) loaded
[pthread_processor::215.325*14.39] 512 Entries (4.0M bases) processed
[meth_main::216.257*14.40] 512 Entries (4.0M bases) loaded
[pthread_processor::217.024*14.40] 512 Entries (3.3M bases) processed
[meth_main::217.873*14.41] 512 Entries (3.7M bases) loaded
[pthread_processor::219.495*14.38] 512 Entries (4.0M bases) processed
[meth_main::220.371*14.39] 512 Entries (3.9M bases) loaded
[pthread_processor::221.544*14.37] 512 Entries (3.7M bases) processed
[meth_main::222.452*14.38] 512 Entries (3.7M bases) loaded
[pthread_processor::223.662*14.37] 512 Entries (3.9M bases) processed
[meth_main::224.545*14.38] 512 Entries (4.0M bases) loaded
[pthread_processor::225.625*14.37] 512 Entries (3.7M bases) processed
[meth_main::226.706*14.38] 512 Entries (3.8M bases) loaded
[pthread_processor::227.585*14.39] 512 Entries (4.0M bases) processed
[meth_main::228.532*14.40] 512 Entries (4.0M bases) loaded
[pthread_processor::229.605*14.39] 512 Entries (3.8M bases) processed
[meth_main::230.542*14.40] 512 Entries (3.9M bases) loaded
[pthread_processor::232.127*14.37] 512 Entries (4.0M bases) processed
[meth_main::233.036*14.38] 512 Entries (4.1M bases) loaded
[pthread_processor::234.133*14.37] 512 Entries (3.9M bases) processed
[meth_main::235.054*14.38] 512 Entries (3.5M bases) loaded
[pthread_processor::236.306*14.38] 512 Entries (4.1M bases) processed
[meth_main::237.357*14.39] 512 Entries (3.8M bases) loaded
[pthread_processor::238.122*14.38] 512 Entries (3.5M bases) processed
[meth_main::239.177*14.39] 512 Entries (3.9M bases) loaded
[pthread_processor::239.969*14.39] 512 Entries (3.8M bases) processed
[meth_main::240.970*14.40] 512 Entries (3.6M bases) loaded
[pthread_processor::242.045*14.39] 512 Entries (3.9M bases) processed
[meth_main::243.185*14.40] 512 Entries (4.1M bases) loaded
[pthread_processor::243.956*14.39] 512 Entries (3.6M bases) processed
[meth_main::244.929*14.40] 512 Entries (3.4M bases) loaded
[pthread_processor::246.125*14.40] 512 Entries (4.1M bases) processed
[meth_main::247.010*14.41] 512 Entries (4.0M bases) loaded
[pthread_processor::247.993*14.40] 512 Entries (3.4M bases) processed
[meth_main::248.789*14.40] 512 Entries (3.4M bases) loaded
[pthread_processor::250.382*14.39] 512 Entries (4.0M bases) processed
[meth_main::251.293*14.40] 512 Entries (3.8M bases) loaded
[pthread_processor::252.096*14.40] 512 Entries (3.4M bases) processed
[meth_main::253.058*14.41] 512 Entries (4.2M bases) loaded
[pthread_processor::254.046*14.40] 512 Entries (3.8M bases) processed
[meth_main::254.944*14.41] 512 Entries (4.2M bases) loaded
[pthread_processor::256.674*14.38] 512 Entries (4.2M bases) processed
[meth_main::257.578*14.39] 512 Entries (3.9M bases) loaded
[pthread_processor::259.009*14.38] 512 Entries (4.2M bases) processed
[meth_main::259.997*14.39] 512 Entries (3.5M bases) loaded
[pthread_processor::261.087*14.38] 512 Entries (3.9M bases) processed
[meth_main::262.082*14.39] 512 Entries (3.6M bases) loaded
[pthread_processor::262.753*14.39] 512 Entries (3.5M bases) processed
[meth_main::263.790*14.40] 512 Entries (3.9M bases) loaded
[pthread_processor::264.660*14.39] 512 Entries (3.6M bases) processed
[meth_main::265.759*14.40] 512 Entries (4.0M bases) loaded
[pthread_processor::266.689*14.40] 512 Entries (3.9M bases) processed
[meth_main::267.778*14.41] 512 Entries (3.9M bases) loaded
[pthread_processor::268.813*14.40] 512 Entries (4.0M bases) processed
[meth_main::269.881*14.41] 512 Entries (3.8M bases) loaded
[pthread_processor::270.970*14.40] 512 Entries (3.9M bases) processed
[meth_main::272.113*14.41] 512 Entries (4.3M bases) loaded
[pthread_processor::273.175*14.39] 512 Entries (3.8M bases) processed
[meth_main::274.280*14.40] 512 Entries (3.6M bases) loaded
[pthread_processor::275.492*14.39] 512 Entries (4.3M bases) processed
[meth_main::276.347*14.40] 512 Entries (4.0M bases) loaded
[pthread_processor::277.460*14.39] 512 Entries (3.6M bases) processed
[meth_main::278.286*14.40] 512 Entries (3.6M bases) loaded
[pthread_processor::279.497*14.39] 512 Entries (4.0M bases) processed
[meth_main::280.428*14.40] 512 Entries (4.0M bases) loaded
[pthread_processor::281.225*14.40] 512 Entries (3.6M bases) processed
[meth_main::282.093*14.41] 512 Entries (3.7M bases) loaded
[pthread_processor::283.328*14.40] 512 Entries (4.0M bases) processed
[meth_main::284.241*14.41] 512 Entries (3.8M bases) loaded
[pthread_processor::285.181*14.41] 512 Entries (3.7M bases) processed
[meth_main::286.027*14.41] 512 Entries (3.7M bases) loaded
[pthread_processor::287.058*14.41] 512 Entries (3.8M bases) processed
[meth_main::288.054*14.42] 512 Entries (3.4M bases) loaded
[pthread_processor::289.178*14.40] 512 Entries (3.7M bases) processed
[meth_main::290.104*14.41] 512 Entries (3.6M bases) loaded
[pthread_processor::290.943*14.41] 512 Entries (3.4M bases) processed
[meth_main::292.049*14.42] 512 Entries (3.9M bases) loaded
[pthread_processor::292.777*14.41] 512 Entries (3.6M bases) processed
[meth_main::293.821*14.42] 512 Entries (3.6M bases) loaded
[pthread_processor::295.123*14.40] 512 Entries (3.9M bases) processed
[meth_main::296.174*14.41] 512 Entries (3.5M bases) loaded
[pthread_processor::297.049*14.41] 512 Entries (3.6M bases) processed
[meth_main::298.117*14.42] 512 Entries (3.7M bases) loaded
[pthread_processor::299.366*14.39] 512 Entries (3.5M bases) processed
[meth_main::300.218*14.39] 512 Entries (3.7M bases) loaded
[pthread_processor::301.437*14.38] 512 Entries (3.7M bases) processed
[meth_main::302.355*14.39] 512 Entries (3.8M bases) loaded
[pthread_processor::303.273*14.39] 512 Entries (3.7M bases) processed
[meth_main::304.329*14.40] 512 Entries (3.9M bases) loaded
[pthread_processor::305.257*14.40] 512 Entries (3.8M bases) processed
[meth_main::306.293*14.40] 512 Entries (3.9M bases) loaded
[pthread_processor::307.295*14.40] 512 Entries (3.9M bases) processed
[meth_main::308.346*14.41] 512 Entries (4.1M bases) loaded
[pthread_processor::309.533*14.39] 512 Entries (3.9M bases) processed
[meth_main::310.618*14.40] 512 Entries (4.0M bases) loaded
[pthread_processor::311.927*14.38] 512 Entries (4.1M bases) processed
[meth_main::312.786*14.39] 512 Entries (3.9M bases) loaded
[pthread_processor::314.053*14.38] 512 Entries (4.0M bases) processed
[meth_main::314.924*14.39] 512 Entries (4.0M bases) loaded
[pthread_processor::316.278*14.37] 512 Entries (3.9M bases) processed
[meth_main::317.147*14.38] 512 Entries (4.0M bases) loaded
[pthread_processor::318.382*14.37] 512 Entries (4.0M bases) processed
[meth_main::319.181*14.38] 512 Entries (3.3M bases) loaded
[pthread_processor::320.482*14.37] 512 Entries (4.0M bases) processed
[meth_main::321.452*14.38] 512 Entries (4.5M bases) loaded
[pthread_processor::322.150*14.38] 512 Entries (3.3M bases) processed
[meth_main::323.088*14.39] 512 Entries (4.4M bases) loaded
[pthread_processor::324.886*14.37] 512 Entries (4.5M bases) processed
[meth_main::325.973*14.37] 512 Entries (3.9M bases) loaded
[pthread_processor::327.477*14.35] 512 Entries (4.4M bases) processed
[meth_main::328.516*14.36] 512 Entries (3.8M bases) loaded
[pthread_processor::329.616*14.35] 512 Entries (3.9M bases) processed
[meth_main::330.688*14.36] 512 Entries (3.6M bases) loaded
[pthread_processor::331.774*14.35] 512 Entries (3.8M bases) processed
[meth_main::332.844*14.35] 512 Entries (3.7M bases) loaded
[pthread_processor::333.779*14.34] 512 Entries (3.6M bases) processed
[meth_main::334.818*14.35] 512 Entries (3.9M bases) loaded
[pthread_processor::335.639*14.35] 512 Entries (3.7M bases) processed
[meth_main::336.995*14.36] 357 Entries (2.5M bases) loaded
[pthread_processor::337.850*14.34] 512 Entries (3.9M bases) processed
[pthread_processor::339.201*14.34] 357 Entries (2.5M bases) processed
[meth_main] total entries: 83813, qc fail: 0, could not calibrate: 2, no alignment: 83811, bad fast5: 0
[meth_main] total bases: 626.1 Mbases
[meth_main] Data loading time: 157.648 sec
[meth_main]     - bam load time: 8.914 sec
[meth_main]     - fasta load time: 61.838 sec
[meth_main]     - fast5 load time: 86.597 sec
[meth_main]         - fast5 open time: 14.969 sec
[meth_main]         - fast5 read time: 68.767 sec
[meth_main] Data processing time: 337.600 sec
[meth_main] Data output time: 0.001 sec
[main] Version: 1.1
[main] CMD: f5c call-methylation -r /data/MAB/2748_NP_methylation/fast5_pass/single_barcode12.guppy2/pass/fast5s.al.guppy.fastq -b /data/MAB/2748_NP_methylation/fastq_pass/barcode12/assembly/assembly.sorted.bam -g /data/MAB/2748_NP_methylation/fastq_pass/barcode12/assembly/assembly.fasta -t 16
[main] Real time: 339.258 sec; CPU time: 4863.068 sec; Peak RAM: 2.700 GB
Arkadiy-Garber commented 1 year ago

Any insight on this would be appreciated, as I have gone through 6 or 7 of these sorts of software and have had no luck in getting useful info. I am fairly new to ONP/methylation, so would appreciate an explanation in more layman terms. Thanks, Arkadiy

hasindu2008 commented 1 year ago

Hi please use the latest pre-release version https://github.com/hasindu2008/f5c/releases/tag/v1.2-beta then specify --pore r10

Arkadiy-Garber commented 1 year ago

Hi Hasindu,

This seems to have worked, thanks so much for your help!!

Cheers, Arkadiy

Arkadiy-Garber commented 1 year ago

Hi Hasindu,

Thanks again for your help on this. I was wondering whether you might help interpret the output files from this program.

I am attaching the output of the script calculate_methylation_frequency.py, which was run on the original output from the f5c_x86_64_linux call-methylation command.

For example, what is the difference between num_motifs_in_group and called_sites columns? What does the group_sequence represent? What do the start and end columns represent, and why are they sometimes the same base position, and sometimes a range?

Thanks for all your help! Arkadiy

assembly.bar16methFreq.txt

hasindu2008 commented 1 year ago

Hi

Could you also please run f5c meth-freq on that output file (which does the same thing as calculate_methylation_frequency.py, but I have tested that option for f5c output).

Arkadiy-Garber commented 1 year ago

Gotcha, here is the file from that run:

assembly.bar16methFreq.txt

hasindu2008 commented 1 year ago

OK that result looks reasonable. perhaps that calculate_methylation_frequency.py does something weird when k-mers sizes are large which is the case for R10.

See if the description here helps on understanding those columns? https://hasindu2008.github.io/f5c/docs/commands#meth-freq It is not descriptive, if still hard to understand I will try to write up something to explain.

This issue in nanopolish also may help https://github.com/jts/nanopolish/issues/365.

Arkadiy-Garber commented 1 year ago

Thanks Hasindu, that is indeed helpful. Here is a sample line from the output: when I use the -s flag:

['contig_1', '5455267', '5455267', '1', '33', '3', '0.091', 'split-group']

All lines have a start and end that's on the same base now. The way I interpret this is that there is 1 site that can be methylated, 33 reads mapping to that location, and 3 of those reads have methylation. Is that correct?

As an FYI, some lines still contain a sequence instead of "split-group":

['contig_1', '5455222', '5455222', '1', '3', '1', '0.333', 'TCAAGAGCCGAGGCACA']

Thanks! Arkadiy

hasindu2008 commented 1 year ago

So now, column 6 divided by column 5 should equal the methylation frequency in column 7. Yeh, that split-group comes when there were multiple CGs in the group and thus had to be split, which you do not have to worry about.

I am curious, what kind of genome is your data?

Arkadiy-Garber commented 1 year ago

Got it, thanks!

This data comes from a bacterial genome. Not 100% sure of the phylogeny, but I think it is somewhere in the Alphaproteobacteria class.

hasindu2008 commented 1 year ago

Ah right, if you happen to have bisulphite data, you can do a correlation test. The model was tested on HG2 and gave around ~0.9 correlation [see https://hasindu2008.github.io/f5c/docs/r10train last step 11]. on HG1 also got around 0.9 correlation with bisulphite.

hasindu2008 commented 1 year ago

Will close the issue for now, feel free to reopen if you face more issues.

Arkadiy-Garber commented 1 year ago

Thanks, Hasindu. This has been immensely helpful so far. I don't think bisulphite data exists for this dataset, but I will mention it to the lab that generated this data. What is HG1 and HG2?

hasindu2008 commented 1 year ago

HG001 (NA12878) and HG002 (NA24385) human reference genome samples.

Arkadiy-Garber commented 1 year ago

Got it, thanks for that clarification. Do you think that this means the results for a bacterial genome are invalid? I am seeing a lot of methylation calls - a lot more than I would have expected in a bacterium.

Is there a model for bacterial genomes that I can use for prediction of methylation?

Thanks for your help, Arkadiy

hasindu2008 commented 1 year ago

Hi, I think it would depend on the 9-mer composition in the bacteria genome. For hidden Markov models, as far as I am aware, we do not need separate models for different species, as long as all the k-mers are trained.
There were 4% of 9-mers that were not too frequent in the human data we used for training; thus, such k-mers were not trained. If such k-mers are abundant in the bacteria, methylation calls related to such k-mers would not be ideal. We could make the present model more complete if we train those currently missing k-mers.

How are you evaluating the methylation calls at the moment? Are you looking at the methylation frequencies - output from meth-freq?

hasindu2008 commented 1 year ago

@Arkadiy-Garber Have you solved this issue? I recently got hold of a bacterial dataset, called methylation and see the average methylation frequency to be almost 0, which is expected as that bacteria apparently does not have any CpG methylation. For the assembly.bar16methFreq.txt you once attached, the average methylation frequency is close to 0:

cut -f 7  assembly.bar16methFreq.txt | tail -n+2 | datamash mean 1 median 1
0.03537209499969        0
Arkadiy-Garber commented 1 year ago

@hasindu2008 thanks for following up! No, this issue has not yet been resolved. Yes, am looking at the output from meth-freq.

Glad you had a chance to get a hold of a bacterial dataset. Do your results suggest that false positives should be rare?

If you are interested, I'd be happy to send you my dataset. I think that might be of benefit to both of us, since you clearly know what you are doing, and I'm still figuring these things out. Let me know, happy to send something via email or Dropbox.

Thanks again for all your help! Arkadiy

hasindu2008 commented 1 year ago

Yes sure. If you could upload some data in fast5 or blow5 format, I can have a look.

hasindu2008 commented 1 year ago

@Arkadiy-Garber Let me know if you want me to have a look on this.

Arkadiy-Garber commented 1 year ago

Hi @hasindu2008, yes! That would be great - sorry for the radio silence. Hectic schedule lately, but I would greatly appreciate. What would be a good way to get you the data (it will be on the order of multiple Gb).

Also, would you like to raw multi-FAST5s derived from the R10 gridion run, or the single-FAST5s that I re-generated using multi_to_single_fast5?

My 2 pipelines are attached. I got different results from these pipelines, and I don't know why...

Pipeline_1.txt Pipeline_2.txt

hasindu2008 commented 1 year ago

@Arkadiy-Garber

Do you have the file assembly.bar12meth.tsv that you generated?

Otherwise, Raw multi-FAST5 from R10 gridion along with the basecalls (combined.fastq), assembly.sorted.bam and assembly.fasta in your pipeline_1.txt would be helpful.

There are two reasons I can think of why the results from the two pipelines are different

  1. I can see that you are still using ~/bin/nanopolish/scripts/calculate_methylation_frequency.py assembly.bar12meth.tsv > assembly.bar12methFreq.tsv in your pipelines. calculate_methylation_frequency.py does not work for R10 models as intended. You should be using f5c meth-freq -i assembly.bar12meth.tsv -s > assembly.bar12methFreq.tsv instead. This could explain why you are seeing too high methylation frequencies for bacteria.
  2. The live basecalling model used in pipeline 1 and the post basecalling model used in pipeline 2 could be different (eg hac vs fast vs sup?). Such difference in the basecalling model also would affect the output, but the differences should be minor.
Arkadiy-Garber commented 1 year ago

Thanks, Hasindu. those files (raw multi-FAST5 from R10 gridion along with the basecalls (combined.fastq), assembly.sorted.bam and assembly.fasta in your pipeline_1.txt) should be in the Box folder that I shared with you on May 7th.

I'll share the assembly.bar12meth.tsv files shortly, thanks!

hasindu2008 commented 1 year ago

I ran the following commands on the combined.fastq file you already provided. I converted your raw multi-FAST5 to BLOW5 format for my convenience, but that does not affect the result.

f5c index combined.fastq --slow5 merged.blow5
minimap2 -ax map-ont assembly.fasta combined.fastq --secondary=no -t20 | samtools sort - -o combined_fastq.bam && samtools index combined_fastq.bam
f5c call-methylation --slow5 merged.blow5 -r combined.fastq -b combined_fastq.bam -g assembly.fasta -t 16 --pore r10 > combined_fastq_meth.tsv
f5c meth-freq -i combined_fastq_meth.tsv -s -o combined_fastq_meth_freq.tsv
cut -f 7 combined_fastq_meth_freq.tsv | tail -n+2 | datamash mean 1 median 1 sstdev 1
0.025360427328806       0       0.055755034577387

The mean, median and sstdev of the methylation frequency is 0.025360427328806, 0, 0.055755034577387 respectively which is pretty low. So what exactly is the problem - you mentioned that you are getting too high methylation frequencies?

Arkadiy-Garber commented 1 year ago

Hi Hasindu,

Thank you for running that, and I appreciate you sharing the results.

I guess what I am confused about is how to interpret those numbers, as well as the output of the meth-freq files. That methylation frequency (0.025) does indeed sound low, but aren't there many methylation calls in the meth-freq (unless I am not interpreting them correctly). Based on your expertise, would you say this data looks valid, especially since the model used for this bacterial analysis is based on human genome data.

Thanks again for all your help, and sorry if I keep bugging with the same/similar questions. This is all still pretty confusing to me.

Cheers, Arkadiy

Arkadiy-Garber commented 1 year ago

Also, I see you went with what sent as pipeline_2. Some of the command parameters are different than mine, and I'd be curious to hear if there are any considerations that you took, which I didn't, that might also impact the output.

As far as getting a profile of methylation patters (e.g. which genes are methylated vs which are not) and comparing across samples, are there any other considerations that you would suggest taking?

Thanks! Arkadiy

hasindu2008 commented 1 year ago

meth_freq.tsv contains a row for each CpG site it encountered in the genomic region - irrespective of whether methylated or unmethylated. Each row tells how many reads there were covering the CpG site and how many out of them were methylated, and based on that the methylation frequency for that CpG site. Yeh, this data look valid (near 0 average methylation frequency is what is expected for bacterias).

The commands:

Arkadiy-Garber commented 1 year ago

Awesome, thanks for your feedback on this! Your help is much appreciated!!

hasindu2008 commented 1 year ago

Closing the issue. Feel free to reopen if you have additional questions.