igvteam / igv-notebook

Module for embedding igv.js in an IPython notebook
MIT License
61 stars 14 forks source link

No alignments displayed #29

Open snakesch opened 9 months ago

snakesch commented 9 months ago

Dear Jim,

Thank you for the amazing IGV notebook tool.

I recently wanted to visualize some alignments (in BAM format) using IGV notebook against the T2T genome from RefSeq (GCF_009914755.1). I tried using both relative paths (code provided below) and absolute paths but none of them was able to show alignments. Could you please advise if there is any incorrect usage with the following implementations? Many thanks in advance!

Note: Jupyter server was started under the directory /paedyl01/disk1/louisshe/. Working directory is /paedyl01/disk1/louisshe/work/common. The version of igv-notebook I am using us 0.5.2.

(Relative paths)

import igv_notebook

igv_notebook.init()

# Browse a local FASTA genome
b = igv_notebook.Browser(
    {
        "reference": {
            "id": "T2T-CHM13v2.0",
            "name": "T2T",
            "fastaURL": "ref/T2T/GCF_009914755.1/GCF_009914755.1_T2T-CHM13v2.0_genomic.fa",
            "indexURL": "ref/T2T/GCF_009914755.1/GCF_009914755.1_T2T-CHM13v2.0_genomic.fa.fai"
        },
        "locus": "NC_060928.1:193,432,000-193,554,000"
    }
)

# Load alignment data
b.load_track(
    {
        "name": "HG02723 PacBio HiFi",
        "url":  "out/D4Z4_HG02723_test/PacBio_HiFi/merged_mapped.sorted.bam",
        "indexURL": "out/D4Z4_HG02723_test/PacBio_HiFi/merged_mapped.sorted.bam.bai",
        "format": "bam",
        "type": "alignment"
    }
)

b.zoom_in()

(absolute paths)

import igv_notebook

igv_notebook.init()

# Browse a local FASTA genome
b = igv_notebook.Browser(
    {
        "reference": {
            "id": "T2T-CHM13v2.0",
            "name": "T2T",
            "fastaURL": "/paedyl01/disk1/louisshe/ref/T2T/GCF_009914755.1/GCF_009914755.1_T2T-CHM13v2.0_genomic.fa",
            "indexURL": "/paedyl01/disk1/louisshe/ref/T2T/GCF_009914755.1/GCF_009914755.1_T2T-CHM13v2.0_genomic.fa.fai"
        },
        "locus": "NC_060928.1:193,432,000-193,554,000"
    }
)

# Load alignment data
b.load_track(
    {
        "name": "HG02723 PacBio HiFi",
        "url":  "/paedyl01/disk1/louisshe/out/D4Z4_HG02723_test/PacBio_HiFi/merged_mapped.sorted.bam",
        "indexURL": "/paedyl01/disk1/louisshe/out/D4Z4_HG02723_test/PacBio_HiFi/merged_mapped.sorted.bam.bai",
        "format": "bam",
        "type": "alignment"
    }
)

b.zoom_in()
jrobinso commented 9 months ago

It might be due to a sequence name mismatch. Compare the sequence names in our bam with those in the fasta. Or just add this to your "reference" definition and try again.

"aliasURL": "https://s3.amazonaws.com/igv.org.genomes/chm13v2.0/GCA_009914755.4.chromAlias.txt"

snakesch commented 9 months ago

Hi Jim,

Thank you for the suggestion. The alignment file (merged_mapped.sorted.bam) was produced by mapping raw sequencing reads to the customised T2T genome, and therefore should have the same contig names.

I also tried adding aliasURL as suggested but I still got no alignments in the output IGV. It appears that the reference was not detected/contig were not recognised.

A snapshot of the IGV output: image

On a side note, I noticed in the aliasURL that the contig names were like CP068277.2 but the ones I have in the customised reference are NC_060925.1. Is there another alias file I can use?

Thanks, Louis

jrobinso commented 9 months ago

What are the names in your fasta file? You can actually get these from the fasta index (fai), which is just a plain text file.

jrobinso commented 9 months ago

Sorry maybe you answered that, so what are the names in your BAM file? You can get these with samtools

samtools view -H  <bam file>
jrobinso commented 9 months ago

Ah, you might try this chrom alias file, which I just discovered at UCSC

https://hgdownload.soe.ucsc.edu/goldenPath/hs1/bigZips/hs1.chromAlias.txt

snakesch commented 9 months ago

I tried using the UCSC alias file and creating one from fai index file. Both failed.

Here are the contig names I found from the BAM file: image

Thanks, Louis

jrobinso commented 9 months ago

OK, well I'm out of ideas. If you are able to zip up and share a test case I will look into it. I can provide you a secure dropbox link. Email igv-team@broadinstitute.org if you would like to do this.

snakesch commented 9 months ago

Many thanks! I just sent an email to the address. As all the data are downloaded from public domains, I think there is no concern to share the data.

jrobinso commented 9 months ago

OK, you should get a link from dropbox to upload the files. If you don't let me know.

Also, could you share a notebook? Or just paste the notebook contents here.

snakesch commented 9 months ago

Hi Jim,

Could you send the link to the address snakesch@connect.hku.hk? I will have the notebook shared with the FASTA and BAM.

Thanks, Louis

jrobinso commented 9 months ago

Yes, sent.

jrobinso commented 9 months ago

I think you have a path problem, the cell below worked for me from your test data. Unfortunately it looks like igv-notebook does not give a friendly message or any message, if the URL isn't found. I will work on that.

Where is this path relative to the location of the notebook?

out/D4Z4_HG02723_test/PacBio_HiFi/merged_mapped.sorted.bam

This worked for me, all files in the directory of the notebook as you sent to us. Note I increased the visibilityWindow to 100kb from the default 30kb as your initial locus is 62kb.

import igv_notebook

igv_notebook.init()

# Browse a local FASTA genome
b = igv_notebook.Browser(
    {
        "reference": {
            "id": "T2T-CHM13v2.0",
            "name": "T2T",
            "fastaURL": "GCF_009914755.1_T2T-CHM13v2.0_genomic.fa",
            "indexURL": "GCF_009914755.1_T2T-CHM13v2.0_genomic.fa.fai",
            "aliasURL": "https://hgdownload.soe.ucsc.edu/goldenPath/hs1/bigZips/hs1.chromAlias.txt"
        },
        "locus": "NC_060928.1:193,432,000-193,554,000"
    }
)

# Load alignment data
b.load_track(
    {
        "name": "HG02723 PacBio HiFi",
        "url":  "merged_mapped_chr4.bam",
        "indexURL": "merged_mapped_chr4.bam.bai",
        "format": "bam",
        "type": "alignment",
        "visibilityWindow": 100000
    }
)

b.zoom_in()
jrobinso commented 9 months ago

Also, BTW, please leave this issue open as the lack of an error message needs to be addressed.

snakesch commented 9 months ago

Dear Jim,

Thank you for your kind help on the issue.

The issue was in fact related to paths. I might have mistaken "the directory relative to the notebook directory" for "the directory where the notebook server is hosted", and hence the issue. I also noticed that absolute paths do not appear to be supported on JupyterLab. Could you let me know if that's expected?

I will keep the issue open as suggested.

Thanks, Louis

jrobinso commented 9 months ago

Absolute paths are not supported on JupyterLab. I find the best method for JupyterLab is to find the file in the Jupyter file widget, then right-click and use "copy path".

snakesch commented 9 months ago

Thank you very much for the advice!