Open jelber2 opened 3 days ago
Hello, we are parsing the ulk part properly, but it is checking if the kits match the ones we exhaustively tested. As this is a lossy compression, we are being very pedantic to avoid a user from inadvertently getting their data affected. These kits will be eventually added when we come across them and test. I have not had access to GridION sqk-ulk114 data, but is very likely the suitable -b would be 3. Is this a publicly available dataset?
As per the Twitter conversation (https://x.com/jpelbers/status/1842484817885073502), here is a Dropbox link to ~30x average coverage ONT ULK reads for HG002 chr22 (based on alignment to hg38 no alts). They were HG002 cells with DNA extracted following a BioNano DNA extraction protocol, undergoing ONT ULK library preparation, then sequenced on an ONT PromethION P2 solo device with an r10.4.1 flowcell connected to a ONT GridION for data acquisition. Provided is an ex-zd, zstd blow5 file that you can access with
wget 'https://www.dropbox.com/scl/fi/8s0p4ttpuy1amiuulzu3v/WGS_HG002_Bionano_recover_13022024.chr22.readids.blow5?rlkey=395acerl9ewgyqkafi7g15ipe&st=giubcawn' -O WGS_HG002_Bionano_recover_13022024.chr22.blow5
on a computer with wget.
Best, Jean Elbers
*NOTE that the blow5 file on Dropbox does not match the header above in this Github issue as I realized those squiggles did not belong to HG002.
For the following s/blow5 header made with blue-crab (0.1.2) , it does not seem that slow5tools degrade (1.3.0) recognizes the ULK kit.
I guess if it is possible to parse the ULK part, then that would be fine or to show the user what bit values to use for different datasets?