vgteam / sequenceTubeMap

displays multiple genomic sequences in the form of a tube map
MIT License
178 stars 24 forks source link

Eliding small snarls #196

Closed adamnovak closed 9 months ago

adamnovak commented 1 year ago

@benedictpaten has the idea that maybe we could have a way to remove small snarls (e.g. single base indels and SNPs) from the tube map, to get a more legible higher-level zoomed-out view.

This might involve having and then modifying the snarl structure of the graph, and might require support on the vg side.

colindaven commented 1 year ago

Excluding small SNPs and indels would be something which we'd love to see as well. Most users seem to be interested in larger features, especially insertions (eg 100 bp + ). The SNPs get in the way a lot.

If there was an easy workaround with vg/odgi to preprocess data it would be fine too, but I'm not aware of how to do this.

adamnovak commented 1 year ago

You can use vg simplify to remove small variants (AKA small "snarls") from a graph. But it won't also process aligned reads.

colindaven commented 1 year ago

Thanks, that's very useful info.

benedictpaten commented 1 year ago

I do think this is an issue we should revisit in tubemaps, along with having indexed support for the GBZ.

On Wed, Aug 30, 2023 at 9:00 AM Colin Davenport @.***> wrote:

Thanks, that's very useful info.

— Reply to this email directly, view it on GitHub https://github.com/vgteam/sequenceTubeMap/issues/196#issuecomment-1699447868, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQ4IFL2MHEHI3BST4T2FLXX5PSJANCNFSM6AAAAAASAIRGPQ . You are receiving this because you were mentioned.Message ID: @.***>

colindaven commented 1 year ago

vg simplify did the trick, thanks for the tip. Next task is to add gene annotation but this seems doable via either vg or odgi.

adamnovak commented 11 months ago

To start on this, we would need:

adamnovak commented 11 months ago

Before adding a new server-side optional pass, we might want to refactor the existing server logic to async/await instead of each phase just calling the next one.

EDIT: We don't need to do this right now, the relevant server logic is all in the first function.

adamnovak commented 11 months ago

The server makes a call to vg chunk to extract the part of the graph to look at, and then it makes a call to vg view to convert the graph to JSON, and it pipes the output of vg chunk into vg view.

We want to interpose a call to vg simplify.

adamnovak commented 11 months ago

vg simplify's help documentation:

[anovak@swords ~]% vg simplify --help
usage: vg simplify [options] old.vg >new.vg
general options:
    -a, --algorithm NAME   simplify using the given algorithm (small, rare; default: small)
    -t, --threads N        use N threads to construct graph (defaults to numCPUs)
    -p, --progress         show progress
    -b, --bed-in           read in the given BED file in the cordinates of the original paths
    -B, --bed-out          output transformed features in the coordinates of the new paths
small snarl simplifier options:
    -m, --min-size N       remove leaf sites with fewer than N bases involved (default: 10)
    -i, --max-iterations N perform up to N iterations of simplification (default: 10)
rare variant simplifier options:
    -v, --vcf FILE         use the given VCF file to determine variant frequency (required)
    -f, --min-freq FLOAT   remove variants with total alt frequency under FLOAT (default: 0)
    -c, --min-count N      remove variants with total alt occurrence count under N (default: 0)
adamnovak commented 11 months ago

The input filename can be - to read from standard input.

adamnovak commented 10 months ago

To send the new flag to the server, we have to add it to the result of getNextViewTarget(): https://github.com/vgteam/sequenceTubeMap/blob/ede0f70ee89249ffa26f64c949deb62af8ec348f/src/components/HeaderForm.js#L462-L468

adamnovak commented 10 months ago

We'd have to fill it in either from a new state field in HeaderForm, or a new state field above HeaderForm (that could then get bound down into the customization accordion), or from a field inside the tracks state that already exists in HeaderForm (so we could control it from the graph tracks settings).

Or we could just put it in the graph track and not have a separate flag for it, and we could make the server look inside the graph track to see if it should be simplified or not.

adamnovak commented 10 months ago

Checkbox button examples for controlling simplify: https://reactstrap.github.io/?path=/docs/components-buttongroup--checkbox-and-radio

adamnovak commented 10 months ago

Until we can also adjust the reads when simplifying (via a new vg simplify feature?) we need to make sure the user can't use simplification and read tracks at the same time.

We also shouldn't let the user use it with a BED region selected, if it uses a premade chunk. Or we have to add a vg simplify pass when loading the premade chunk.

adamnovak commented 10 months ago

Rather than forbidding simplification with BED regions, we can just add the simplify pass in front of the vg view pass when reading the chunk's graph, as long as no read tracks are in use.