Open Lys9527 opened 1 year ago
FAI:
https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/samtools/reference/FastaSequenceIndexCreator.java
TBI:
private static int copyTo(final Path invcf,final Path outvcfpath) throws IOException {
try(VCFReader vcfReader1 = new VCFFileReader(invcf,false)) {
final VCFHeader header1 = vcfReader1.getHeader();
final VariantContextWriterBuilder vcb = new VariantContextWriterBuilder();
final SAMSequenceDictionary dict1 = header1.getSequenceDictionary();
vcb.setReferenceDictionary(header1.getSequenceDictionary());
vcb.setIndexCreator(new TabixIndexCreator(dict1, TabixFormat.VCF));
vcb.setOption(Options.INDEX_ON_THE_FLY);
vcb.setOutputPath(outvcfpath);
vcb.setCreateMD5(false);
try(final VariantContextWriter w = vcb. build()) {
w.writeHeader(header1);
try(final CloseableIterator<VariantContext> iter1 = vcfReader1.iterator()) {
while(iter1.hasNext()) {
w.add(iter1.next());
}
}
}
}
return count;
}
FAI:
https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/samtools/reference/FastaSequenceIndexCreator.java
TBI:
private static int copyTo(final Path invcf,final Path outvcfpath) throws IOException { try(VCFReader vcfReader1 = new VCFFileReader(invcf,false)) { final VCFHeader header1 = vcfReader1.getHeader(); final VariantContextWriterBuilder vcb = new VariantContextWriterBuilder(); final SAMSequenceDictionary dict1 = header1.getSequenceDictionary(); vcb.setReferenceDictionary(header1.getSequenceDictionary()); vcb.setIndexCreator(new TabixIndexCreator(dict1, TabixFormat.VCF)); vcb.setOption(Options.INDEX_ON_THE_FLY); vcb.setOutputPath(outvcfpath); vcb.setCreateMD5(false); try(final VariantContextWriter w = vcb. build()) { w.writeHeader(header1); try(final CloseableIterator<VariantContext> iter1 = vcfReader1.iterator()) { while(iter1.hasNext()) { w.add(iter1.next()); } } } } return count; }
Thank you very much for your answer, I have understood the generation of fai file, for tai file, do you need to limit the format of the input file here? For example, gff file, sorry I am not learning this major, but I need to use this function in my project, so the problem is a little rude, please forgive me
for TBI +BED, you can use
TabixIndexCreator indexCreator=new TabixIndexCreator(TabixFormat.BED);
writer = new BlockCompressedOutputStream(this.outputFile);
long filePosition= writer.getFilePointer();
while(in.hasNext())
{
String line = in.next();
BedLine bed = this.bedCodec.decode(line);
if(bed==null) continue;
writer.write(line.getBytes());
writer.write('\n');
indexCreator.addFeature(bed, filePosition);
filePosition = writer.getFilePointer();
}
}
writer.flush();
Index index = indexCreator.finalizeIndex(writer.getFilePointer());
it's the same idea for GFF but used a gff3 codec: https://www.javadoc.io/static/com.github.samtools/htsjdk/2.24.1/htsjdk/tribble/gff/Gff3Codec.html
for TBI +BED, you can use
TabixIndexCreator indexCreator=new TabixIndexCreator(TabixFormat.BED); writer = new BlockCompressedOutputStream(this.outputFile); long filePosition= writer.getFilePointer(); while(in.hasNext()) { String line = in.next(); BedLine bed = this.bedCodec.decode(line); if(bed==null) continue; writer.write(line.getBytes()); writer.write('\n'); indexCreator.addFeature(bed, filePosition); filePosition = writer.getFilePointer(); } } writer.flush(); Index index = indexCreator.finalizeIndex(writer.getFilePointer());
it's the same idea for GFF but used a gff3 codec: https://www.javadoc.io/static/com.github.samtools/htsjdk/2.24.1/htsjdk/tribble/gff/Gff3Codec.html
Thank you very much for your detailed answer. If you can give me an example, for example, my file is /path/to/SCHZ.rRNA.gff.gz, how do I need to generate the “/path/to/SCHZ.rRNA.gff.gz.tbi” file?Sorry I'm new to the java building website project, very unproficient, bothering you!
Using codecs and IndexFactory should do fine I guess. Reiterating over the actual file may not be necessary
Here is my code from my repository
final VCFCodec codec = new VCFCodec();
if (this.VCFFile.getAbsolutePath().endsWith(FileExtensions.VCF)) {
idxpath = this.VCFFile.getAbsolutePath() + FileExtensions.TRIBBLE_INDEX;
try {
final Index idx = IndexFactory.createDynamicIndex(this.VCFFile, codec,
IndexBalanceApproach.FOR_SEEK_TIME);
idx.write(new File(idxpath));
} catch (final Exception e) {
if (e instanceof UnsortedFileException) {
OverSeer.log(this.getClass().getSimpleName(),
"VCF File is not sorted. Sorting function is currently not implemented. Please sort your vcf using a proper tool.",
OverSeer.ERROR);
} else {
e.printStackTrace();
}
}
} else if (this.VCFFile.getAbsolutePath().endsWith(FileExtensions.COMPRESSED_VCF)) {
idxpath = this.VCFFile.getAbsolutePath() + FileExtensions.TABIX_INDEX;
try {
final Index idx = IndexFactory.createIndex(this.VCFFile, codec, IndexType.TABIX);
idx.write(new File(idxpath));
} catch (final Exception e) {
e.printStackTrace();
}
}
I bet it will work for GFF with suitable codec within htsjdk.
I recently used htsjdk 3.0.5 to complete my own project, I would like to ask how htsjdk needs to be used to generate fai files and tbi files, thank you