nanoporetech / pod5-file-format

Pod5: a high performance file format for nanopore reads.
https://pod5.nanoporetech.com
Other
121 stars 15 forks source link

pod5 view does not work for some data since version 0.3.0 #107

Open davidsilvapires opened 6 months ago

davidsilvapires commented 6 months ago

Hi, everybody!

It seems that the subcommand "view" of the command "pod5" does not work for all kind of Nanopore data. See, for example, the following case:

curl -OL 'https://sra-pub-src-1.s3.amazonaws.com/SRR23640421/barcode01.tar.gz.gz.1' tar zxvvf barcode01.tar.gz.gz.1 pip install pod5==0.2.4 pod5 convert fast5 barcode01/*.fast5 --threads 8 --output barcode01.pod5 pod5 view barcode01.pod5 --threads 8 --include 'read_id, channel' --output summary.tsv

With version 0.2.4, the command works fine. But from version 0.3.0 and above:

pip install pod5==0.3.0 rm barcode01.pod5 summary.tsv pod5 convert fast5 barcode01/*.fast5 --threads 8 --output barcode01.pod5 pod5 view barcode01.pod5 --threads 8 --include 'read_id, channel' --output summary.tsv

the following error is returned:

POD5 has encountered an error: 'Error while processing 'barcode01.pod5'' For detailed information set POD5_DEBUG=1'

If we set the environment variable:

export POD5_DEBUG=1 rm summary.tsv pod5 view barcode01.pod5 --threads 8 --include 'read_id, channel' --output summary.tsv

polars.exceptions.ColumnNotFoundError: not_set

Error originated just after this operation:
WITH_COLUMNS:
[String(barcode01.pod5).alias("filename"), col("read_id").bin.hex_encode().str.slice().str.concat_horizontal([String(-), col("read_id").bin.hex_encode().str.slice(), String(-), col("read_id").bin.hex_encode().str.slice(), String(-), col("read_id").bin.hex_encode().str.slice(), String(-), col("read_id").bin.hex_enc ode().str.slice()]), col("well").alias("mux"), col("num_minknow_events").alias("minknow_events"), col("experiment_name").alias("experiment_id"), [(col("start")) / (col("sample_rate"))].alias("start_time"), [(col("num_samples")) / (col("sample_rate"))].alias("duration")]
INNER JOIN:
LEFT PLAN ON: [col("run_info")]
UNION
PLAN 0:
WITH_COLUMNS:
[col("run_info").strict_cast(String)]
DF ["read_id", "read_number", "start", "median_before"]; PROJECT /20 COLUMNS; SELECTION: "None"
PLAN 1:
WITH_COLUMNS:
[col("run_info").strict_cast(String)]
DF ["read_id", "read_number", "start", "median_before"]; PROJECT
/20 COLUMNS; SELECTION: "None"
PLAN 2:

(...)

  PLAN 99:
     WITH_COLUMNS:
     [col("run_info").strict_cast(String)]
      DF ["read_id", "read_number", "start", "median_before"]; PROJECT */20 COLUMNS; SELECTION: "None"
END UNION

RIGHT PLAN ON: [col("acquisition_id")] UNIQUE BY None UNIQUE BY None DF ["acquisition_id", "acquisition_start_time", "adc_max", "adc_min"]; PROJECT */18 COLUMNS; SELECTION: "None" END INNER JOIN

Error originated just after this operation: ErrorStateSync(AlreadyEncountered(not found: not_set

Error originated just after this operation: WITH_COLUMNS: [String(barcode01.pod5).alias("filename"), col("read_id").bin.hex_encode().str.slice().str.concat_horizontal([String(-), col("read_id").bin.hex_encode().str.slice(), String(-), col("read_id").bin.hex_encode().str.slice(), String(-), col("read_id").bin.hex_encode().str.slice(), String(-), col("read_id").bin.hex_enc ode().str.slice()]), col("well").alias("mux"), col("num_minknow_events").alias("minknow_events"), col("experiment_name").alias("experiment_id"), [(col("start")) / (col("sample_rate"))].alias("start_time"), [(col("num_samples")) / (col("sample_rate"))].alias("duration")] INNER JOIN: LEFT PLAN ON: [col("run_info")] UNION PLAN 0: WITH_COLUMNS: [col("run_info").strict_cast(String)] DF ["read_id", "read_number", "start", "median_before"]; PROJECT /20 COLUMNS; SELECTION: "None" PLAN 1: WITH_COLUMNS: [col("run_info").strict_cast(String)] DF ["read_id", "read_number", "start", "median_before"]; PROJECT /20 COLUMNS; SELECTION: "None" PLAN 2:

(...)

  PLAN 99:
     WITH_COLUMNS:
     [col("run_info").strict_cast(String)]
      DF ["read_id", "read_number", "start", "median_before"]; PROJECT */20 COLUMNS; SELECTION: "None"
END UNION

RIGHT PLAN ON: [col("acquisition_id")] UNIQUE BY None UNIQUE BY None DF ["acquisition_id", "acquisition_start_time", "adc_max", "adc_min"]; PROJECT /18 COLUMNS; SELECTION: "None" END INNER JOIN)) WITH_COLUMNS: [String(barcode01.pod5).alias("filename"), col("read_id").bin.hex_encode().str.slice().str.concat_horizontal([String(-), col("read_id").bin.hex_encode().str.slice(), String(-), col("read_id").bin.hex_encode().str.slice(), String(-), col("read_id").bin.hex_encode().str.slice(), String(-), col("read_id").bin.hex_enc ode().str.slice()]), col("well").alias("mux"), col("num_minknow_events").alias("minknow_events"), col("experiment_name").alias("experiment_id"), [(col("start")) / (col("sample_rate"))].alias("start_time"), [(col("num_samples")) / (col("sample_rate"))].alias("duration")] INNER JOIN: LEFT PLAN ON: [col("run_info")] UNION PLAN 0: WITH_COLUMNS: [col("run_info").strict_cast(String)] DF ["read_id", "read_number", "start", "median_before"]; PROJECT /20 COLUMNS; SELECTION: "None" PLAN 1: WITH_COLUMNS: [col("run_info").strict_cast(String)] DF ["read_id", "read_number", "start", "median_before"]; PROJECT */20 COLUMNS; SELECTION: "None" PLAN 2:

(...)

  PLAN 99:
     WITH_COLUMNS:
     [col("run_info").strict_cast(String)]
      DF ["read_id", "read_number", "start", "median_before"]; PROJECT */20 COLUMNS; SELECTION: "None"
END UNION

RIGHT PLAN ON: [col("acquisition_id")] UNIQUE BY None UNIQUE BY None DF ["acquisition_id", "acquisition_start_time", "adc_max", "adc_min"]; PROJECT */18 COLUMNS; SELECTION: "None" END INNER JOIN

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/bin/pod5", line 8, in sys.exit(main()) File "/storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages/pod5/tools/main.py", line 60, in main return run_tool(parser) File "/storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages/pod5/tools/parsers.py", line 41, in run_tool raise exc File "/storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages/pod5/tools/parsers.py", line 38, in run_tool return tool_func(kwargs) File "/storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages/pod5/tools/parsers.py", line 744, in run return view_pod5(kwargs) File "/storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages/pod5/tools/utils.py", line 59, in wrapper raise exc File "/storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages/pod5/tools/utils.py", line 56, in wrapper ret = func(*args, **kwargs) File "/storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages/pod5/tools/pod5_view.py", line 535, in view_pod5 launch_view_workers( File "/storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages/pod5/tools/pod5_view.py", line 489, in launch_view_workers join_workers(processes, exceptions_queue) File "/storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages/pod5/tools/pod5_view.py", line 395, in join_workers raise RuntimeError(f"Error while processing '{path}'") from exc RuntimeError: Error while processing 'barcode01.pod5'

Can you figure out what is the problem?

Thanks in advance for any help.

Best regards.

-- David

davidsilvapires commented 6 months ago

Sorry, only now I saw issue #106, opened yesterday by roo-weed. I am facing the same problem that him. I am not sure if it is possible to join the issues, but the solution to one will certainly be the solution to the other.

My apologies.

Best regards.

-- David da Silva Pires

HalfPhoton commented 6 months ago

Hi @davidsilvapires, thanks for the detailed report.

Can you report which version of polars is installed in each case?

Kind regards, Rich

davidsilvapires commented 6 months ago

Hello @HalfPhoton.

Yes, of course. Each time I reinstall pod5, a different version of polars is installed:

Bellow you can find all the messages that I receive after each reinstallation of pod5:

(pod5) [13:31:55] pires@vital:/project/jcunha/hiChromatin/project/florenciaDiazViraqueEtAl2023/methylation/00-fast5ToPod5/issue :( $ pip install pod5==0.2.4
Collecting pod5==0.2.4
  Using cached pod5-0.2.4-py3-none-any.whl.metadata (20 kB)
Collecting lib-pod5==0.2.4 (from pod5==0.2.4)
  Using cached lib_pod5-0.2.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.1 kB)
Requirement already satisfied: iso8601 in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.2.4) (2.1.0)
Requirement already satisfied: more-itertools in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.2.4) (10.1.0)
Requirement already satisfied: numpy>=1.21.0 in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.2.4) (1.24.4)
Requirement already satisfied: pyarrow~=11.0.0 in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.2.4) (11.0.0)
Requirement already satisfied: pytz in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.2.4) (2023.3.post1)
Requirement already satisfied: packaging in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.2.4) (23.2)
Collecting polars~=0.17.12 (from pod5==0.2.4)
  Using cached polars-0.17.15-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (14 kB)
Requirement already satisfied: h5py~=3.8.0 in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.2.4) (3.8.0)
Requirement already satisfied: vbz-h5py-plugin in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.2.4) (1.0.1)
Requirement already satisfied: tqdm in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.2.4) (4.66.1)
Using cached pod5-0.2.4-py3-none-any.whl (66 kB)
Using cached lib_pod5-0.2.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.9 MB)
Using cached polars-0.17.15-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
Installing collected packages: polars, lib-pod5, pod5
  Attempting uninstall: polars
    Found existing installation: polars 0.20.3
    Uninstalling polars-0.20.3:
      Successfully uninstalled polars-0.20.3
  Attempting uninstall: lib-pod5
    Found existing installation: lib-pod5 0.3.0
    Uninstalling lib-pod5-0.3.0:
      Successfully uninstalled lib-pod5-0.3.0
  Attempting uninstall: pod5
    Found existing installation: pod5 0.3.0
    Uninstalling pod5-0.3.0:
      Successfully uninstalled pod5-0.3.0
Successfully installed lib-pod5-0.2.4 pod5-0.2.4 polars-0.17.15
(pod5) [13:32:02] pires@vital:/project/jcunha/hiChromatin/project/florenciaDiazViraqueEtAl2023/methylation/00-fast5ToPod5/issue :) $ pip install pod5==0.3.0
Collecting pod5==0.3.0
  Using cached pod5-0.3.0-py3-none-any.whl.metadata (20 kB)
Collecting lib-pod5==0.3.0 (from pod5==0.3.0)
  Using cached lib_pod5-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.1 kB)
Requirement already satisfied: iso8601 in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.0) (2.1.0)
Requirement already satisfied: more-itertools in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.0) (10.1.0)
Requirement already satisfied: numpy>=1.21.0 in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.0) (1.24.4)
Requirement already satisfied: pyarrow~=11.0.0 in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.0) (11.0.0)
Requirement already satisfied: pytz in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.0) (2023.3.post1)
Requirement already satisfied: packaging in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.0) (23.2)
Collecting polars~=0.19 (from pod5==0.3.0)
  Using cached polars-0.20.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (14 kB)
Requirement already satisfied: h5py~=3.8.0 in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.0) (3.8.0)
Requirement already satisfied: vbz-h5py-plugin in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.0) (1.0.1)
Requirement already satisfied: tqdm in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.0) (4.66.1)
Using cached pod5-0.3.0-py3-none-any.whl (69 kB)
Using cached lib_pod5-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB)
Using cached polars-0.20.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (28.6 MB)
Installing collected packages: polars, lib-pod5, pod5
  Attempting uninstall: polars
    Found existing installation: polars 0.17.15
    Uninstalling polars-0.17.15:
      Successfully uninstalled polars-0.17.15
  Attempting uninstall: lib-pod5
    Found existing installation: lib-pod5 0.2.4
    Uninstalling lib-pod5-0.2.4:
      Successfully uninstalled lib-pod5-0.2.4
  Attempting uninstall: pod5
    Found existing installation: pod5 0.2.4
    Uninstalling pod5-0.2.4:
      Successfully uninstalled pod5-0.2.4
Successfully installed lib-pod5-0.3.0 pod5-0.3.0 polars-0.20.3

Thank you very much for helping me with this issue.

-- David

HalfPhoton commented 6 months ago

Could you please try pod5 0.3.6 with polars~=0.19.19?

Kind regards, Rich

davidsilvapires commented 6 months ago

Sure. It worked, @HalfPhoton. Thank you very much!

I didn't set the polars version at pod5 reinstallation. The installed version was polars~=0.19. And then, the pod5 view command worked. See below the complete output:

(pod5) [13:32:35] pires@vital:/project/jcunha/hiChromatin/project/florenciaDiazViraqueEtAl2023/methylation/00-fast5ToPod5/issue :) $ pip install pod5==0.3.6
Collecting pod5==0.3.6
  Downloading pod5-0.3.6-py3-none-any.whl.metadata (20 kB)   
Collecting lib-pod5==0.3.6 (from pod5==0.3.6)                 
  Downloading lib_pod5-0.3.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.1 kB)
Requirement already satisfied: iso8601 in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.6) (2.1.0)
Requirement already satisfied: more-itertools in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.6) (10.1.0)
Requirement already satisfied: numpy>=1.21.0 in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.6) (1.24.4)
Collecting pyarrow~=14.0.0 (from pod5==0.3.6)                            
  Using cached pyarrow-14.0.2-cp38-cp38-manylinux_2_28_x86_64.whl.metadata (3.0 kB)
Requirement already satisfied: pytz in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.6) (2023.3.post1)
Requirement already satisfied: packaging in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.6) (23.2)
Requirement already satisfied: polars~=0.19 in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.6) (0.20.3)
Collecting h5py~=3.10.0 (from pod5==0.3.6)        
  Using cached h5py-3.10.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.5 kB)
Requirement already satisfied: vbz-h5py-plugin in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.6) (1.0.1)
Requirement already satisfied: tqdm in /storage/zuleika/volume3/project/jcunha/hiChromatin/local/src/venv/pod5/lib/python3.8/site-packages (from pod5==0.3.6) (4.66.1)
Downloading pod5-0.3.6-py3-none-any.whl (69 kB)   
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 69.2/69.2 kB 785.9 kB/s eta 0:00:00
Downloading lib_pod5-0.3.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.6/2.6 MB 3.2 MB/s eta 0:00:00
Using cached h5py-3.10.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB)
Using cached pyarrow-14.0.2-cp38-cp38-manylinux_2_28_x86_64.whl (38.1 MB)
Installing collected packages: pyarrow, lib-pod5, h5py, pod5
  Attempting uninstall: pyarrow                   
    Found existing installation: pyarrow 11.0.0   
    Uninstalling pyarrow-11.0.0:                  
      Successfully uninstalled pyarrow-11.0.0     
  Attempting uninstall: lib-pod5                  
    Found existing installation: lib-pod5 0.3.0   
    Uninstalling lib-pod5-0.3.0:                  
      Successfully uninstalled lib-pod5-0.3.0     
  Attempting uninstall: h5py                      
    Found existing installation: h5py 3.8.0       
    Uninstalling h5py-3.8.0:                      
      Successfully uninstalled h5py-3.8.0         
  Attempting uninstall: pod5                      
    Found existing installation: pod5 0.3.0       
    Uninstalling pod5-0.3.0:                      
      Successfully uninstalled pod5-0.3.0         
Successfully installed h5py-3.10.0 lib-pod5-0.3.6 pod5-0.3.6 pyarrow-14.0.2

(pod5) [13:41:47] pires@vital:/project/jcunha/hiChromatin/project/florenciaDiazViraqueEtAl2023/methylation/00-fast5ToPod5/issue :) $ pod5 convert fast5 barcode01/*.fast5 --threads 8 --output barcode01.pod5
Converting 120 Fast5s: 100%|###################################################################################################################################################################################################################################################| 480000/480000 [03:09<00:00, 2538.36Reads/s]

(pod5) [13:45:02] pires@vital:/project/jcunha/hiChromatin/project/florenciaDiazViraqueEtAl2023/methylation/00-fast5ToPod5/issue :) $ pod5 view barcode01.pod5 --threads 8 --include 'read_id, channel' --output summary.tsv

(pod5) [13:45:43] pires@vital:/project/jcunha/hiChromatin/project/florenciaDiazViraqueEtAl2023/methylation/00-fast5ToPod5/issue :) $ head summary.tsv | csvlook
| read_id                              | channel |
| ------------------------------------ | ------- |
| 0009b664-b136-4423-9487-382c275a3425 |     327 |
| 0012bebe-80f7-48fe-a1b6-3b5274c7e4cc |     268 |
| 00167268-5f65-4911-af07-38797fc1f057 |     347 |
| 00484065-97de-4310-9c0f-bc0abd186485 |     160 |
| 0058c84e-fff9-4773-8bba-d66283b2d58f |     496 |
| 00612188-1194-4740-866f-b4c32e56661d |     305 |
| 00aa5144-da5b-43ee-b929-563e139d86fb |     110 |
| 00b72d21-a6d9-4dc1-b4bb-43932566c1f3 |     373 |
| 00b79052-4536-43df-a3af-1fa1c729036b |      23 |

Have you already updated the code at GitHub? I ask because I tried the same commands yesterday from a fresh install and pod5 view wasn't working. I created a new virtual environment now and installed pod5 with just pip install pod5, without specifying the version, and everything is working now. During the install, the lines referring polars were:

Collecting polars~=0.19
  Downloading polars-0.20.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (28.6 MB)
     |████████████████████████████████| 28.6 MB 11.2 MB/s 

And the final message:

Successfully installed h5py-3.10.0 iso8601-2.1.0 lib-pod5-0.3.6 more-itertools-10.2.0 numpy-1.24.4 packaging-23.2 pod5-0.3.6 polars-0.20.3 pyarrow-14.0.2 pytz-2023.3.post1 tqdm-4.66.1 vbz-h5py-plugin-1.0.1

So, I confirm that everything works with the most updated version available right now.

I am very grateful for you support. Thank you very much, @HalfPhoton.

Kind regards.

-- David

HalfPhoton commented 6 months ago

Hi @davidsilvapires , I thought we'd fixed this issue in 0.3.6 but apparently not!

I'll leave this ticket open until we push a patch.

Kind regards, Rich