igvteam / igv-reports

Python application to generate self-contained pages embedding IGV visualizations, with no dependency on original input files.
MIT License
347 stars 51 forks source link

Add support for side-by-side variants/BEDPE format #65

Closed DevangThakkar closed 2 years ago

DevangThakkar commented 2 years ago

This pull request addresses multiple issues: #52, #62, #63, #64. The support for side-by-side variants is slightly hacky. If we want to display two variants from the same chromosome side by side, we would need to have the reference for that chromosome without gaps (can be filled with Ns and compressed, but this is still inefficient and would lead to a bloated output). My solution to this is to edit the chromosome notation and trick IGV.js into believing the the two variants are from different references. I do this by having one reference have chromosomes names in the regular chrA format, whereas the other reference has chromosome names in a modified chr_A format. The name change system is defined in utils.py and can be changed to some other system easily if needed. See the image below for an example.

image

This is the main bulk of this pull request but I have also fixed other smaller issues such as #62 and #64. I have added as many tests as I could think of, but please let me know if you need me to add more.

Known limitations:

Let me know what you think!

Cheers, Devang

jrobinso commented 2 years ago

Interesting, thanks for the PR, the reference issue is the reason this has not been done long ago. I think i can work with this.

jrobinso commented 2 years ago

Thanks again for the PR. I've taken a different approach to this, basically changed igv.js's intepretation of ">" lines with locus strings to define a sequence slice of the chromosome, rather than the complete "chromosome". There can be any number of slices defined in the fasta associated with a chromosome. I also added an optional @len token to tell igv.js what the complete length of the chromosome is. This is optional, but will result in the current locus box on the cytoband to be correct (this has always been an issue).

An example fasta follows. With this change I think you can rollback all of your coding changes, with the exception of the extension to "Feature" to support bedpe, modify the output fasta file to conform to the example below, and it should just work. If it doesn't I might need to make further tweaks to igv.js, but I think that is where support for this should be added.

You can get a copy of igv.js that supports this by building from https://github.com/igvteam/igv.js, or just use a prebuilt copy. from https://igv.org/web/snapshot/dist/. So for example replace this line in "variant-template.html"

<script src="https://cdn.jsdelivr.net/npm/igv@2.8.6/dist/igv.min.js"></script>

with

<script src="https://igv.org/web/snapshot/dist/igv.min.js"></script>

If you have time to try this and update this PR let me know. If not I can get to it by the end of the week.

Example multi-locus fasta

>chr1:2000001-2000025 @len=249250621
TTTGCTGAGGATTGGGCTTGGGTAC
>chr3:2000001-2000025 @len=198022430
TTTGCTGAGGATTGGGCTTGGGTAC
>chr1:1000001-1000025 @len=249250621
GGGCACAGCCTCACCCAGGAAAGCA
DevangThakkar commented 2 years ago

That sounds good - I can update this PR in the next couple of days!

DevangThakkar commented 2 years ago

I've reverted back some of my changes to incorporate your changes to igv.js - let me know what you think.

Cheers, Devang

jrobinso commented 2 years ago

Looks good overall, I haven't quite finished reviewing but have some initial comments.

(1) bedpe is a distinct format, it is not really "bed" format in the way bed+4, bed+5, etc etc are. I would prefer we treat it as such with its own parser, not overload the bed parsing function with an optional argument. If the format is "bedpe" we know we have a mulit-locus (more specifically 2-locus) report. We do not need a "--split" argument. Related, re the example, It is bad practice to use a ".bed" extension for a bedpe file, many software packages will assume this is a "bed" file. Extension should be "bedpe".

So instead of def parse_bed(f, split_bool=False): we just have a new function def parse_bedpe(f).

(2) Related -- I don't think you can tabix index a bedpe file, I have never seen one. Theoretically you could index on either of the 2 location columns (but only 1), but this would be somewhat non-sensical since order of the regions for any feature is arbitrary. In any event that would be an unusual thing to do and we don't need to support it, in fact probably shouldn't support it to eliminate confusion. So the tabix related changes to support bedpe can be removed.

(3) I found the name "split_bool" confusing, though maybe its o.k. and I'll adjust to it. We don't generally append the type to variable names, is this a python convention?

Again overall this looks very good.

DevangThakkar commented 2 years ago

Hi Jim,

Thanks for your feedback!

1) I did not know that bedpe had a different extension - I've been using .bed for them all my life 😅 I'll fix that and remove the --split argument then.

2) Yeah that makes sense to me, I'll remove the snippet.

3) No particular reason - I can change it to split

jrobinso commented 2 years ago

Ok thanks for the quick turnaround. I'll get this merged soon.

jrobinso commented 2 years ago

Merged. Thanks for the contribution. Releasing this will require first a release of igv.js, this should happen in the next week or so.

jrobinso commented 2 years ago

Sorry I convinced myself the minor changes above had been made. Long day. No problem though I can easily make these small changes.

DevangThakkar commented 2 years ago

Thanks for merging it in, happy to have helped! :)

jrobinso commented 2 years ago

@DevangThakkar yes, big help, in addition to these new features I see you found and fixed a few bugs. We might send a tweet out from our igv team twitter account when we release this, if you have a twitter handle email it to us at igv-team (at) broadinstitute.org so we can acknowledge you.

I just pushed my changes, two of note as they affect use: (1) bedpe files must have ".bedpe" extension, and (2) the --split flag is no longer needed or used. This is inferred from the use of bedpe format.

DevangThakkar commented 2 years ago

Will do, Jim! As someone who uses IGV a lot, it's nice to be able to contribute to some aspect of it. Feel free to assign me to future feature requests to this repo if you need help. :)