skovaka / uncalled4

MIT License
43 stars 3 forks source link

Question regarding eventalign-flag "samples" #17

Open denisbeslic opened 6 months ago

denisbeslic commented 6 months ago

Hi,

I have a question regarding the values of the eventalig-flag "samples": The samples column from f5c's eventalign shows normalized pA values. The samples column of uncalled4 shows normalized values (on a scale from -2 to +2). I was wondering how to transform / rescale the samples-values of uncalled4 so that I have normalized pA values.

Command:

uncalled4 align data/zymo-human/GRCh38_no_alt.fna data/zymo-human/PGXX22563_pcr/PGXX22563_reads.slow5 --bam-in \
target.sorted_chr21.bam --eventalign-out events.tsv --eventalign-flags print-read-names,signal-index,samples \
--pore-model dna_r10.4.1_400bps_9mer --flowcell FLO-MIN114 --kit SQK-LSK114

Example from uncalled4

contig  position        reference_kmer  read_name       strand  event_index     event_level_mean        event_stdv      event_length  model_kmer       model_mean      model_stdv      standardized_level      start_idx       end_idx samples
chr21   5023659 CCCCCCACC       f1d5a296-cf8a-4919-854a-fd26a321fdc2    t       0       68.47   1.72563 0.0015  GGTGGGGGG       69.61846.8939  -1.04247        1354    1360    -1.00827,-1.01449,-1.01449,-0.958537,-1.19478,-1.06423

Example from f5c/nanopolish

contig  position        reference_kmer  read_name       strand  event_index     event_level_mean        event_stdv      event_length  model_kmer       model_mean      model_stdv      standardized_level      start_idx       end_idx samples
chr21    10174   ACCCTAACC       03ef78de-47b9-495b-b22d-f2b3fbf16a34    t       5446    69.64   2.302   0.00275 ACCCTAACC       69.51 6.34     0.02    27061   27072   69.549,71.6475,70.7482,67.3006,68.6497,70.2985,67.9002,64.4527,69.0994,72.397,74.0458
skovaka commented 6 months ago

Sorry for the delay, that is a bug. Uncalled4 uses pore models scaled to have mean=0 stdv=1, along with fixed parameters to linearly scale to picoamps, which I apply to "event_mean" and "even_stdv", but forgot about "samples". I will implement a fix to scale those eventalign samples soon, but in the meantime you could manually scale them using this formula: pa=23.1105*samples + 92.5619. Those scaling factors can be found if you use --bam-out then the BAM header will contain "pa_mean": 92.56193542480469, "pa_stdv": 23.11051368713379.

denisbeslic commented 6 months ago

Thank you for your help!