splatlab / squeakr

Squeakr: An Exact and Approximate k -mer Counting System
BSD 3-Clause "New" or "Revised" License
85 stars 23 forks source link

Error opening file for serializing.: Is a directory #48

Closed kbchoi closed 2 years ago

kbchoi commented 2 years ago

Thank you for developing this highly useful tool. I am having an issue unfortunately though while processing my fastq file of size ~500MB in both v0.6 and v0.7.

$ squeakr count -e -k 15 -t 32 -o ./ my.fastq
[2022-08-21 03:08:48.703] [squeakr_console] [info] Reading from the fastq file and inserting in the CQF.
[2022-08-21 03:08:53.871] [squeakr_console] [info] Trying to compress the final CQF.
[2022-08-21 03:08:54.365] [squeakr_console] [info] Estimated size of the final CQF: 29
[2022-08-21 03:08:54.365] [squeakr_console] [info] Calculating frequency distribution:
[2022-08-21 03:09:00.170] [squeakr_console] [info] Iteration: Total Time Elapsed: 5.804785 seconds
Error opening file for serializing.: Is a directory

This happens if I use -s 20 -t 1 too. Any insight on how to get around this issue?

rob-p commented 2 years ago

Hi @kbchoi,

I believe the -o option should point to the name of the desired output file. The problem here is that ./ is a directory, and already exists. Hence, it can't be used as the path for serializing the output squeakr. Have you tried something like:

  squeakr count -e -k 15 -t 32 -o my.squeakr my.fastq

which should write the output in a file called my.squeakr in the current directory (directory from which this command was run).

kbchoi commented 2 years ago

Thank you @rob-p for your prompt response. I was there but it failed me saying I do not have that folder.

$ squeakr count -e -k 15 -t 32 -o my.squeakr my.fastq

Parsing command line failed with exception: The required input directory my.squeakr does not seem to exist.

But specifying an output file with folder name did the trick.

$ squeakr count -e -k 15 -t 32 -o ./my.squeakr my.fastq
[2022-08-21 04:43:45.131] [squeakr_console] [info] Reading from the fastq file and inserting in the CQF.
[2022-08-21 04:43:50.108] [squeakr_console] [info] Trying to compress the final CQF.
[2022-08-21 04:43:50.601] [squeakr_console] [info] Estimated size of the final CQF: 29
[2022-08-21 04:43:50.601] [squeakr_console] [info] Calculating frequency distribution:
[2022-08-21 04:43:56.407] [squeakr_console] [info] Iteration: Total Time Elapsed: 5.806478 seconds
[2022-08-21 04:43:56.466] [squeakr_console] [info] Counting: Total Time Elapsed: 11.335329 seconds
[2022-08-21 04:43:56.466] [squeakr_console] [info] Maximum freq: 200755
[2022-08-21 04:43:56.466] [squeakr_console] [info] Num distinct elem: 117430718
[2022-08-21 04:43:56.466] [squeakr_console] [info] Total num elems: 242093664
enricorox commented 1 year ago

This should be reopened as a bug. For the user it is not so obvious that the input/working directory is parsed from the given output file...