Closed WimSpee closed 1 year ago
IGV uses the same library as the GATK to read CRAM files, so its not going to be addressed until the underlying issue is addressed in the htsjdk. I see there is already an open ticket for that, which you referenced. I can't really think of a workaround, other than to try BAM + CSI indexs, but that won't help with VCFs.
As an experiment, have you tried using igv-web (https://igv.org/app)? It does not have any restrictions on chromosome length that I am aware of.
Thank you for the information. We will give IGV web app a try. That might be a good workaround for us. It will take a few days to get firewall access to it or get it installed on a computer close to the data. Will let you know if it works.
Does IGV web app use htslib instead of htsjdk to read in the CRAM and CRAI files?
IGV web app uses the jbrowse CRAM library. https://github.com/GMOD/cram-js
There is no need to install igv-webapp "close to the data", it is entirely a client program so everything runs in your web browser. There is no server component. https://igv.org just hosts the static html and javascript pages which are downloaded to your computer for running.
Local install is just a backup option if I can't get firewall access to https://igv.org/ on a machine that is close to the data. Cool that there is a pure JS CRAM reader.
We have zipped archives for local install, see the readme at https://github.com/igvteam/igv-webapp. Or you can just build it.
I managed to get firewall access to igv-webapp on https://igv.org/ on a Linux machine that is close to the data. I am using firefox 102.8.0esr .
However, I am unable to load any reference genome fasta in igv-webapp. Via Genome -> Local File.
Even the just 1Gb tomato genome fails to load. https://solgenomics.net/ftp//tomato_genome/assembly/build_4.00/S_lycopersicum_chromosomes.4.00.fa.gz
ERROR
Genome requires either a single JSON file or a FASTA file & index file
OK
fai and dict index files are present next to the fasta.
Java/HTSJDK IGV is able to open these fasta reference genomes on the same machine, using the same paths.
Does IGV web-app require anything special to open local reference genome files? Do you have advice on how to troubleshoot this error?
Thank you.
Due to security considerations for web browsers, IGV-Web is not able to automatically load the index file like the desktop application. You must load both the fasta file and the index file, at the same time. So if you are loading a local file, you need to select both in the file chooser.
Thank you for the information. I can confirm that by selecting both the FASTA+FAI and CRAM+CRAI I could now load and display the Onion genome and sequencing reads in IGV web-app.
Thanks again for the help and the nice and useful IGV software.
Glad it worked out. Thanks for letting us know.
That's good to know. Kudos to the JBrowse team, @rbuels and @cmdcolin especially for the JS Cram library. You are right, this is cool, it still amazes me they pulled this off.
Dear IGV developers,
Thank you for the very nice IGV software.
We would like the visualize the content of CRAM files on large reference genomes in IGV. For example Wheat and Onion both have c.a. 16GB genomes with chromosomes much larger than 500Mb.
See below for the chromosome lengths of the public wheat genome assembly. Onion has only 8 chromosomes, but even larger, in the range of 1Gb to 2.5Gb.
Since it is now possible to create the reference genomes and re-sequencing data for relevant plant (and animal) large genomes species, it would be very nice if these could also be loaded in IGV. Using CRAM files and CRAI indexes.
For both Onion and Wheat CRAM files we get this exact error when trying to load the CRAM files into IGV.
This error also has been reported here: https://github.com/broadinstitute/gatk/issues/8192
The CRAM file is valid according to
samtools quickcheck
. The CRAI index was created viasamtools index -c large_genome.cram
The CRAI index can be used by samtools to read slices from the CRAMsamtools view -T large_genome.fa large_genome.cram large_chr1B
Splitting the chromosomes to c.a. 500Mb is not really a workaround. For Onion some chromosomes would need to be split in 5 pieces.
And some other downstream analysis (on e.g. the VCF related to the CRAM) require the chromosomes to represent the real chromosome, e.g. to match with the genetic map.
Thank you for your thoughts and help on this.
Wheat IWGSC V1 reference genome https://urgi.versailles.inra.fr/download/iwgsc/IWGSC_RefSeq_Assemblies/v1.0/iwgsc_refseqv1.0_all_chromosomes.zip