marbl / CHM13

The complete sequence of a human genome
Other
908 stars 98 forks source link

Sequencing platform for CHM13 dataset #32

Closed ktan8 closed 3 years ago

ktan8 commented 3 years ago

Hi,

I understand that the CHM13 nanopore datasets were generated at 4 different sites with PromethION sequencing done at UCD (runs 225 and 226), and MinION/GridION was presumably done at the other 3 sites (NHGRI, U of Nottingham, and UW).

From the T-to-T consortium paper (https://www.nature.com/articles/s41586-020-2547-7#Sec6), it was mentioned that "Most sequencing was performed on the Nanopore GridION with FLO-MIN106 or FLO-MIN106D R9 flow cells, with the exception of one Flongle flow cell used for testing."

Can I confirm if the GridION was used to generate the datasets from NHGRI, U of Nottingham, and UW? Alternatively, were some of the runs generated with a MinION? If so, is there a way for us to tell which runs these are?

Thanks for your help!

skoren commented 3 years ago

The Nottingham data has two PromethION cells (partition 97-98), one GridION (partition 96), and one flongle (partition 95, run on a MinION). There are MinION/GridION cells mixed into the UW data. The NHGRI data I believe is all GridION. The MinION and GridION use interchangeable flow cells so I'm not sure it's important to differentiate between the two. If you did want to, the raw fast5 files contain a device_type field so you could download the raw data and split reads into Grid/MinION based on their fast5 info.

ktan8 commented 3 years ago

Dear Sergey,

Thank you so much for the really nice summary! This information is extremely helpful. We saw some slight differences in the data from some of the sites. The information you provided is thus really useful in helping us decide how to interpret them.

Thanks!