hyeshik / poreplex

A versatile sequenced read processor for nanopore direct RNA sequencing
Other
79 stars 14 forks source link

squiggle scaling? #13

Closed Huanle closed 5 years ago

Huanle commented 5 years ago

@hyeshik ,

it seems the dumped adapter squiggles have been re-scaled by poreplex?

Raw: image

Poreplex: image

Can you please tell me the formula for the conversion? Also, is this necessary?

Thanks a lot

hyeshik commented 5 years ago

Hi @Huanle,

Pooreplex provides two types of scaled signals, mean and scaled_mean. The former is calculated by the simple conversion to pA scale using the formula defined by the digitization parameters in FAST5. Please refer to the ONT's technical documents for the exact formula.

scaled_mean is the one poreplex use for the most part of its analysis. It is derived from mean, and it is further scaled by a recurrent neural network model shipped with the poreplex distribution. Many types of squiggle analysis become simpler by this RNN-based normalization, but some types of analyses do not receive any benefit from it. I recommend you to use the original raw signal if you need to feed the signal to other software.

Hyeshik

Huanle commented 5 years ago

Thanks a lot hyeshik. But where can i find information of adapter start and end?

hyeshik commented 5 years ago

They are available as HDF5 attributes in each dataset entry in the basecalled events outputs.

>>> import h5py
>>> ff = h5py.File('poreplex.Set180606/events/inventory.h5', 'r')
>>> dict(ff['basecalled_events/000/00039824-9618-464e-ac21-942e1f79d1a5'].attrs)
{'signal_scale': 1.0291171, 'signal_shift': 2.3550591, 'adapter_begin': 2805, 'adapter_end': 10050,  'polya_begin': 11699, 'polya_end': 14963, 'spikes': b'[]'}
Huanle commented 5 years ago

Hi @hyeshik ,

thanks a lot. this is very helpful.