ksumngs / HapLink.jl

Viral haplotype calling by linkage disequilibrium
https://ksumngs.github.io/HapLink.jl
MIT License
0 stars 1 forks source link

[Feature]: Use standard in/out #23

Open MillironX opened 2 years ago

MillironX commented 2 years ago

Have each of the haplink commands read from standard input and write to standard output unless appropriate file flags are called.

Additional Removed Arguments

haplink variants

haplink haplotypes

haplink sequences

Example Usage

Command

minimap2 -ax map-ont --MD example/reference.fasta reads.fastq \
  | tee output.sam \
  | haplink variants --reference example/reference.fasta \
  | tee output.vcf \
  | haplink haplotypes  --bam output.sam \
  | haplink sequences --reference example/reference.fasta \
  > output.fasta

Context

Most bioinformatic tools work with standard in/out. It would be convenient if HapLink did this, too, as it could then be added to cli pipelines for more efficient work.

Possible Implementation

Each of the stdin/stdout parameters will need to be made optional, with manual checks for stdin. Also, read/write operations will need to open stdin/out instead of files.

The big problem here is that https://github.com/BioJulia/XAM.jl does not support a unified API for SAM and BAM records (see https://github.com/BioJulia/XAM.jl/issues/25). Although https://github.com/samtools/htslib and related tools do not seem to care, https://github.com/genome/bam-readcount does require an indexed BAM file, so there will need to be a way to create, sort, and index a BAM file is SAM input is given. There will still need to be a way to sort and index BAM input if given via stdin.

MillironX commented 1 year ago

Partially implemented with #35. Stdout is used unless a parameter is passed, but stdin is more tricky.