nanoporetech / pod5-file-format

Pod5: a high performance file format for nanopore reads.
https://pod5-file-format.readthedocs.io/
Other
126 stars 18 forks source link

Split Read IDs Cause Missing Read Error? #126

Closed peradastra closed 4 months ago

peradastra commented 4 months ago

Issue Description

Using summary table from Dorado to subset POD5 leads to error for split reads which are assigned new IDs during demux. Also encountering a sys error but able to run without any obvious issue.

Logs

(base) [hilaire@ad.bcm.edu@rpv-oitghp-p02 split_pod5]$ pod5 subset ../pod5/*.pod5 --table ../summaries/simplex_supv4p3p0.txt --columns barcode sys:1: MapWithoutReturnDtypeWarning: Calling map_elements without specifying return_dtype can lead to unpredictable results. Specify return_dtype to silence this warning. sys:1: MapWithoutReturnDtypeWarning: Calling map_elements without specifying return_dtype can lead to unpredictable results. Specify return_dtype to silence this warning. Parsed 1073956 targets

POD5 has encountered an error: 'Missing read_ids from inputs but --missing-ok not set'

Specifications

HalfPhoton commented 4 months ago

Hi @peradastra, The error is telling you what to do as is the following note in the documentation

[!NOTE] The filter and subset tool will assert that any requested read_ids are present in the inputs. If a requested read_id is missing from the inputs then the tool will issue the following error: POD5 has encountered an error: 'Missing read_ids from inputs but --missing-ok not set' To disable this warning then set the ‘-M / –missing-ok’ flag.

As for the dtype warnings - we'll take a look at this thanks.

Kind regards, Rich