vgteam / sequenceTubeMap

displays multiple genomic sequences in the form of a tube map
MIT License
180 stars 25 forks source link

Left-right button and region description integration #392

Closed adamnovak closed 8 months ago

adamnovak commented 8 months ago

This attaches the left-right button stuff to the region description stuff so that the description updates when you go left and right.

It also includes chunk generation script changes to let you set palettes from the command line.

To do the demo today with the Lancet data I used this bit of Bash:

rm -f exampleData/lancet_regions.bed
for REGION_NAME in $((cd /Users/anovak/Downloads/STM_DataShare_Nov07_2023/giraffe_alignments/ && ls normal.*.sorted.gam) | sed 's/normal\.//g' | sed 's/\.sorted\.gam//g') ; do
    CONTIG="$(echo "${REGION_NAME}" | cut -f1 -d'_')"
    COORD_START="$(echo "${REGION_NAME}" | cut -f2 -d'_')"
    COORD_END="$(echo "${REGION_NAME}" | cut -f3 -d'_')"
    (cd exampleData && rm -Rf "lancet_regions-${REGION_NAME}" && ../scripts/prepare_local_chunk.sh -x /Users/anovak/Downloads/STM_DataShare_Nov07_2023/giraffe_alignments/${REGION_NAME}.giraffe.gbz -r "${CONTIG}:${COORD_START}-${COORD_END}" -o "lancet_regions-${REGION_NAME}" -g /Users/anovak/Downloads/STM_DataShare_Nov07_2023/giraffe_alignments/tumor.${REGION_NAME}.sorted.gam -g /Users/anovak/Downloads/STM_DataShare_Nov07_2023/giraffe_alignments/normal.${REGION_NAME}.sorted.gam -p '{"mainPalette": "reds", "auxPalette": "reds"}' -p '{"mainPalette": "blues", "auxPalette": "blues"}' >> lancet_regions.bed)
done
sort exampleData/lancet_regions.bed -k2n -s | sort -k1.4n,2 -s >exampleData/lancet_regions.sorted.bed
rm exampleData/lancet_regions.bed

This goes through the Lancet regions, parses them out into coordinate components, and makes each into a pre-extracted chunk, putting the resulting BED line in a BED file. I put the tumor data in red, and the normal data in blue. I have to cd into the directory where the chunk directories and the BED file are when doing this, to make sure that the relative paths in the BED file to the chunk directories are from where the BED file actually is.

Once I have the full BED file, I sort it first by the region start coordinate as a number, and then stably by the part of the contig name after the first 3 "chr" characters, to get the BED in coordinate order.

A remaining problem with this approach is that the chunk JSON files refer to the files in my Downloads directory, so if you open the track settings when looking at a region, it asks the server about those files and then pops up an error saying they aren't allowed to be accessed.