nanopore-wgs-consortium / NA12878

Data and analysis for NA12878 genome on nanopore
Other
372 stars 93 forks source link

Deuplicate read IDs in RNA dataset: Bham2_Run #114

Open hasindu2008 opened 1 year ago

hasindu2008 commented 1 year ago

In the Bham2_run downloaded from https://s3.amazonaws.com/nanopore-human-wgs/rna/links/NA12878-DirectRNA_All.files.txt, 94143 read IDs appear twice. Is this an anomaly caused during single fast5 to multi fast5, that is same read being packed twice, or are they real separate reads that MinKNOW assigned the same read ID?

hasindu2008 commented 1 year ago

@mattloose. A similar duplication is present in two CDNA runs namely, Bham1 and Bham2.

mattloose commented 1 year ago

These will not be real separate reads. I presume these are errors in how the run data was originally compiled. @mitenjain did you process these?