Closed mrvollger closed 2 years ago
Hi! Thanks for making an issue!
There are two pieces here:
BgzfSyncReader
implements the Read
trait.BufRead
]() trait brings into scope a set of methods that allow for reading lines on a BufReader
So, you just need to wrap the BgzfSyncReader
in a BufReader
. Which is a bit redundant and I should really implement BufRead
for the BgzfSyncReader
.
std::io::BufRead
to bring into scope the line reading methods:use std::io::BufRead;
let mut reader = BufReader::new(bgzf_reader);
// The lines method will create a new string allocation for each new line
for line in reader.lines() {
// do stuff
}
// Reuse a line buffer, this still has to copy bytes from the underlying reader into the buffer
let mut buffer = String::new();
loop {
if let Some(bytes_read) = reader.read_line(&mut buffer) {
if bytes_read == 0 { break }
}
// do stuff
buffer.clear()
}
Read
] only and manages its own internal buffer of lines. So, to fill out your example using the most basic read line method:
use std::{
error::Error,
fs::File,
io::{self, BufRead, BufReader},
path::{Path, PathBuf},
};
use gzp::BgzfSyncReader;
const BUFFER_SIZE: usize = 1024 * 64;
type DynResult<T> = Result<T, Box<dyn Error + 'static>>;
/// Get a buffered input reader from stdin or a file
fn get_input(path: Option<PathBuf>) -> DynResult<Box<dyn BufRead + Send + 'static>> {
let reader: Box<dyn BufRead + Send + 'static> = match path {
Some(path) => {
if path.as_os_str() == "-" {
Box::new(BufReader::with_capacity(BUFFER_SIZE, io::stdin()))
} else {
Box::new(BufReader::with_capacity(BUFFER_SIZE, File::open(path)?))
}
}
None => Box::new(BufReader::with_capacity(BUFFER_SIZE, io::stdin())),
};
Ok(reader)
}
/// Example trying bgzip
/// ```
/// use rustybam::myio;
/// let f = ".test/asm_small.paf.bgz";
/// myio::test_gbz(f);
/// ```
pub fn test_bgz(filename: &str) {
let ext = Path::new(filename).extension();
eprintln!("{:?}", ext);
let pathbuf = PathBuf::from(filename);
let box_dny_bufread_send = get_input(Some(pathbuf)).unwrap();
let gbzf_reader = BufReader::new(BgzfSyncReader::new(box_dny_bufread_send));
for line in gbzf_reader.lines() {
eprintln!("{}", line.unwrap());
}
}
fn main() {
println!("Hello, world!");
}
To clarify, you have identified an issue with gzp
which is that iterating over lines requires double buffering since it doesn't implement BufRead
on its own even though it really could.
Thank you so much for this worked out example, it is very helpful!!! One last question, is there an easy/standard way to test whether an input file is gziped or bgziped using gzp
.
Thanks again for this awesome tool and the quick responce.
There is not currently any way to check the first few bytes of a file to check if it's compressed or not. In other applications I just do the simple thing and look at incoming file extensions or require that a CLI arg be passed in to indicate the input stream is compressed.
But this is something I intend to fix in the future!
Got it, thanks so much for the help!
Hello,
I am newish to rust and I have what is probably a very simple question. How do I read in line by line a bgzipped file?
I have this code which is largely borrowed from
crabz
(see bottom), but once I have theBgzfSyncReader
I am not sure how to iterate over it, or manipulate it in any way.Thanks in advance! Mitchell