nanopore-wgs-consortium / NA12878

Data and analysis for NA12878 genome on nanopore
Other
372 stars 93 forks source link

unlisted flow cells #11

Closed mateidavid closed 6 years ago

mateidavid commented 7 years ago

We downloaded part of this dataset (I think chr 20 only), and I am seeing some flow cell names which don't appear in the README: FAB38968, FAB42483, and FAB46682. (I'm also missing FAB42483 entirely.) What are these, and why aren't they listed in the README? Thanks!

nickloman commented 7 years ago

When we collect the statistics for the main table we key off asic_id in the FAST5 files, as this is the field that reliably identifies the flow cell and is picked up in software (rather than being user inputted, and therefore potentially prone to error!).

Could you possibly check the asic_id in any of the FAST5 files from those flow cells you mentioned, and check them against the main table?

mateidavid commented 7 years ago

This is what I see under /UniqueGlobalKey/tracking_id in the file MinION2_20160920_FNFAB38968_MN16454_sequencing_run_Chip97_Human_R9_4_22975_ch56_read188_strand.fast5:

#asic_id_17          60455
#asic_id             4246400039
#asic_id_eeprom      1743454
#asic_temp           33.5767784
#auto_update_source  https://mirror.oxfordnanoportal.com/software/MinKNOW/
#bream_is_standard   0
#device_id           MN16454
#exp_script_hash     97b95d42372f12941c837c811172c7804f0655ba
#exp_script_name     python/recipes/map/ER_48Hr_Sequencing_Run_FLO-MIN106.py
#exp_script_purpose  sequencing_run
#exp_start_time      1474393582
#flow_cell_id        FAB38968
#heatsink_temp       34.0390625
#hostname            MinION2
#installation_type   map
#operating_system    Windows                                                  6.1
#protocol_run_id     19df5a24-fed4-4113-8915-370cfc59c4fd
#protocols_version   1.0.5.0
#run_id              a3eaa4dc205ab5f4ad5142dd609d42af6cf130a3
#usb_config          1.0.11_ONT#MinION_fpga_1.0.1#ctrl#Auto
#version             1.0.5

The flow_cell_id above FAB38968 is one of the names that do not appear in the main table. I checked the asic_id and indeed it seems to be equal to that from other files for which flow_cell_id is set to FAB39075. But if asic_id is more reliable, why isn't that one listed in the main table? Is there a way to compute/decode/look up the correct flow_cell_id based on the asic_id?

nickloman commented 6 years ago

Sorry this is rather a delayed response! You can use this table to look up between flowcell_id and asic_id! https://github.com/nanopore-wgs-consortium/NA12878/blob/master/ENA.csv