jtlovell / GENESPACE

Other
180 stars 24 forks source link

Gaps not recognized when building contig map #114

Closed hanmueller96 closed 1 year ago

hanmueller96 commented 1 year ago

When building contig maps, it seems as though GENESPACE is not finding gaps. I have confirmed that there are gaps for each contig in the .fasta file, but this does not translate when building the contig map.

This is the output of print(lapply(genomeGrs, head))

$gaps
GRanges object with 0 ranges and 0 metadata columns:
   seqnames    ranges strand
      <Rle> <IRanges>  <Rle>
  -------
  seqinfo: 26 sequences from an unspecified genome

$contigs
GRanges object with 6 ranges and 0 metadata columns:
        seqnames     ranges strand
           <Rle>  <IRanges>  <Rle>
  [1] scaffold_1 1-27316886      *
  [2] scaffold_2 1-25469394      *
  [3] scaffold_3 1-24508975      *
  [4] scaffold_4 1-24030051      *
  [5] scaffold_5 1-22507561      *
  [6] scaffold_6 1-19840655      *
  -------
  seqinfo: 26 sequences from an unspecified genome

$telomeres
GRanges object with 6 ranges and 1 metadata column:
        seqnames            ranges strand |    position
           <Rle>         <IRanges>  <Rle> | <character>
  [1] scaffold_1           4-14657      * |        left
  [2] scaffold_1       23850-24179      * |        left
  [3] scaffold_1       24687-25167      * |        left
  [4] scaffold_2            7-2262      * |        left
  [5] scaffold_2         2788-3025      * |        left
  [6] scaffold_2 25453885-25455888      * |       right
  -------
  seqinfo: 26 sequences from an unspecified genome

Here is the script I'm using:

library(GENESPACE)
suppressPackageStartupMessages(library(data.table))
suppressPackageStartupMessages(library(Biostrings))
suppressPackageStartupMessages(library(ggplot2))

faFile <- ("yahs.out_scaffolds_final.fa")
dnaSS <- Biostrings::readDNAStringSet(faFile)

ns <- names(dnaSS)
#ns <- ns[!grepl("unlocalized|RANDOM", ns) & # drop chrs with these terms
           #grepl("GRCh38.p14 Primary Assembly", ns)] # keep chrs with these
dnaSS <- dnaSS[ns]

# -- parse the chromosome names so they are better for plotting ...
#names(dnaSS) <- sapply(names(dnaSS), function(x) strsplit(x, " ")[[1]][[5]])
#names(dnaSS) <- gsub(",", "", names(dnaSS), fixed = T)

plantTelomereKmers <- "TTTAGGG"
genomeGrs <- find_contigsGapsTelos(
  dnass = dnaSS,
  teloKmers = plantTelomereKmers,
  maxDistBtwTelo = 500,
  minTeloSize = 100,
  minTeloDens = .5,
  maxDist2end = 50e3,
  minChrSize = 1e6, 
  minContigGapSize = 1000)

print(lapply(genomeGrs, head))

plot_contigs(
  cgt = genomeGrs,
  nColors = 20,
  palette = viridis::viridis)

Please let me know if there are any additional files or snippets I can provide that might be helpful! Thank you!

hanmueller96 commented 1 year ago

Oops! I changed the minContigGapSize and that resolved my issue! Thank you