Process substitutions for input data yield errors

kowallus / mbgc

Multiple Bacteria Genome Compressor (MBGC)

GNU General Public License v3.0

6 stars 1 forks source link

Process substitutions for input data yield errors #16

Open karel-brinda opened 10 months ago

karel-brinda commented 10 months ago

Minimal example:

$ mbgc c -i <(printf '>seq\nGGGGGGGGGGGGGGGGGGAAAAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG\n') a.mbgc
Switching to single file compression mode.
Problem reading from file: /dev/fd/63 (read_bytes: 0 < size: 18446744073709551615; ferror code: 1; feof code: 0)

kowallus commented 10 months ago

I don't understand how it should work, option -i expects a filename.

karel-brinda commented 10 months ago

See eg https://en.wikipedia.org/wiki/Process_substitution or https://tldp.org/LDP/abs/html/process-sub.html.

In my pipelines, I used them extremely widely; they're much easier and safer than eg named pipes.

An example with xz:

xz --stdout <(printf '>seq\nGGGGGGGGGGGGGGGGGGAAAAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG\n' | cut -c1-10) > a.xz

In this specific case, in can be done in different ways too, but it's just an example of these are commonly used.

karel-brinda commented 10 months ago

For instance, for testing, you can easily replace prog myfile.txt by prog <(head myfile.txt), without changing anything less

kowallus commented 10 months ago

I think your example was incorrect. However, in the case of very small inputs and stdin there is a bug and max mode should be used:

$ mbgc c -m3 -i - a.mbgc < <(printf '>seq\nGGGGGGGGGGGGGGGGGGAAAAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG\n')    
Switching to single file compression mode.
processed reference dataset - 63 [ms]
matching finished - 68 [ms]

compressed 78 bytes to 408 (523.077% == x0.2)
encoding time - 170 [ms]

karel-brinda commented 10 months ago

I think your example was incorrect.

Which one? The one with xz? I've just tested it and it works on my computers. Might depend on your shell – eg if you use sh, this feature isn't supported (it requires at least bash).

kowallus commented 10 months ago

I meant the first one with mbgc.

kowallus commented 10 months ago

I think your example was incorrect. However, in the case of very small inputs and stdin there is a bug and max mode should be used:

$ mbgc c -m3 -i - a.mbgc < <(printf '>seq\nGGGGGGGGGGGGGGGGGGAAAAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG\n')    
Switching to single file compression mode.
processed reference dataset - 63 [ms]
matching finished - 68 [ms]

compressed 78 bytes to 408 (523.077% == x0.2)
encoding time - 170 [ms]

@karel-brinda does the form above work for you?