lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
482 stars 133 forks source link

Does cmpbams support CRAMs? #150

Closed lckarssen closed 4 years ago

lckarssen commented 4 years ago

Subject of the issue

I would like to compare two CRAM files to see if (and how) to CRAM files built with slightly different pipelines have the same alignments (or not). Given that jvarkit uses htslib, I assumed cmpbams would work with CRAMs as well.

I tried running the tool on two identical CRAM files, but I get the following error message:

[SEVERE][CompareBams]A valid CRAM reference was not supplied and one cannot be acquired via the property settings sam
jdk..reference_fasta or samjdk..use_cram_ref_download
java.lang.IllegalStateException: A valid CRAM reference was not supplied and one cannot be acquired via the property 
settings samjdk..reference_fasta or samjdk..use_cram_ref_download

It seems that the reference file(s) can't be found. However, the files listed in the @SQ blocks of the CRAM header exist and are accessible. Both CRAM files are indexed and the .crai files are in the same directories as the respective CRAM files.

Your environment

lindenb commented 4 years ago

I just pushed a fix to use a reference with option-R https://github.com/lindenb/jvarkit/commit/9c88a805fc01e0f0f84bf1b28371a61847e8d19f

I hope it helps, I didn't test it :-)

lckarssen commented 4 years ago

Cool! That's quick :smile:. I was just testing whether using HTSlib's samjdk.reference_fasta option works, and that also seems to be the case:

java -Dsamjdk.reference_fasta=/path/to/reference.fa -jar ~/tmp/jvarkit/dist/cmpbams.jar 

I'll test HEAD after my current run finishes.

lckarssen commented 4 years ago

I did a quick test and can confirm that the -R option works. A quick glance at the output shows the same results as using the -Dsamjdk.reference_fasta= option.

Thanks a lot for the quick solution! :+1: