Closed tmassingham-ont closed 4 years ago
One for @aphillippy or @skoren I think.
Looks like you're correct, there are two issues. Partition 98 did not get uploaded correctly so it is missing data. I've replaced partition 98 with the right version.
The rest of the duplicates are redundant. When we packaged the individual partitions into tgz multiple partitions met the same wildcard and thus got packaged into the same files. There shouldn't be any missing fast5 data. I cleaned up the download page to remove the extraneous partitions and renamed the files.
How many total fast5 files do you end up with after extracting all the partitions after fixing 98? There should be about 11m.
Thanks. Download in progress, I'll update when I have numbers.
Thanks. Download in progress, I'll update when I have numbers.
Looks good to me now, thank you.
Hello and many thanks for sharing your data.
I'm currently rebasecalling the data using the latest methods and noticed that many of the fast5 downloads are duplicates of other partitions. Are there reads missing and, if so, is it possible to obtain them please?
I've confirmed the duplication unpacking the files and comparing the reads. Its curious that the duplicate files have a different md5sum to the original; presumably the order in which the reads are packed in the file was not deterministic.
In all, I think there are the following equivalent partitions:
You can approximately confirm the duplication by looking at the file sizes provided by S3