haotianteng / Chiron

A basecaller for Oxford Nanopore Technologies' sequencers
Other
121 stars 54 forks source link

Training Data #63

Closed pcjedi closed 6 years ago

pcjedi commented 6 years ago

I found the supporting data at gigadb.org (http://gigadb.org/dataset/100425, which is falsely linked at academic.oup.com, by the way). I find signal and label data, but a mandatory tfrecords file appears to be missing, or can I use any tfrecord file?

haotianteng commented 6 years ago

The tfrecords file is generated by raw.py, if you follow the steps described in README.md.

On Tue., 3 Jul. 2018, 1:50 am pcjedi, notifications@github.com wrote:

I found the supporting data at gigadb.org ( http://gigadb.org/dataset/100425, which is falsely linked at academic.oup.com, by the way). I find signal and label data, but a mandatory tfrecords file appears to be missing, or can I use any tfrecord file?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/haotianteng/Chiron/issues/63, or mute the thread https://github.com/notifications/unsubscribe-auth/AKo3Xx9ODaxuRjwzs7b0zdTy9atCKl2-ks5uCpWmgaJpZM4U__eo .

pcjedi commented 6 years ago

Thanks, but I wanted to train Chiron with the 'original' trainingdata provided by gigadb. The signal and label data provided via gigadb appears to be useless unless the corresponding tfrecords file is also provided.

haotianteng commented 6 years ago

Oh yes that's correct as Chiron no longer use the signal and label file format, you can do it in two ways, one is to use the older version of Chiron, e.g. 0.3, or you can download the original fast5 files from https://data.genomicsresearch.org/Projects/train_set_all

On Tue., 3 Jul. 2018, 7:08 am pcjedi, notifications@github.com wrote:

Thanks, but I wanted to train Chiron with the 'original' trainingdata provided by gigadb. The signal and label data provided via gigadb appears to be useless unless the corresponding tfrecords file is also provided.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/haotianteng/Chiron/issues/63#issuecomment-402023773, or mute the thread https://github.com/notifications/unsubscribe-auth/AKo3X4pHeeO6-al1RT6JMcJvQ3kN_H8eks5uCwpngaJpZM4U__eo .

SrinikhilReddy commented 6 years ago

Hi @haotianteng. Wanted to know if the model provided in chiron/model/DNA_Default was entirely trained using the dataset in http://gigadb.org/dataset/100425 or if any additional training sets were used. If yes, could me point me to the same resources?

Thanks, Naga.

haotianteng commented 5 years ago

Chiron V0.3 is entirely trained using the dataset described in the paper. No additional training sets were used. Chiron V0.4, on the other hands, use an additional human dataset to make it perform better on Human dataset.