stianlagstad / chimeraviz

chimeraviz is an R package that automates the creation of chimeric RNA visualizations.
37 stars 14 forks source link

Warning message: The following named parsers don't match the column names: J_FFPM, S_FFPM #15

Closed zhixue closed 6 years ago

zhixue commented 6 years ago

Hello Stian, I got the output from STAR-Fusion(Release v1.2.0) and installed the chimeraviz for bioconductor. After preparation, I run the following command:

fusions <- importStarfusion("Sample1_star_fusion_outdir/star-fusion.fusion_predictions.abridged.tsv", "hg19")

Reading filename caused a warning: Error in vector("list", dim(report)[1]) : 'length' is not corrent Warning message: The following named parsers don't match the column names: J_FFPM, S_FFPM

So I compared the columns of examplefile in chimeraviz and output of the starfusion:

In STAR-Fusion , I saw FFPM in example format while J_FFPM, S_FFPM were mentioned below:

Since the number of fusion-supporting reads depends on both the expression of the fusion transcript and the number of reads sequenced, we provide normalized measures of the split reads and spanning fragments as FFPM (fusion fragments per million total reads) measures: J_FFPM for the junction/split reads and S_FFPM for the spanning fragments. If you sum them (a column not yet included but will be soon), you can filter based on this value to remove many lowly expressed and likely artifact fusions. A filter of 0.1 sum FFPM tends to be very effective.

I am confused about this now, could you please give some suggestions? Thank you:)

stianlagstad commented 6 years ago

I suspect chimeraviz isn't updated with the latest output format of STAR-Fusion. I'll update it early next week (off to vacation right now:). Thank you for reporting!

zhixue commented 6 years ago

Thanks for your answer! Wish you a pleasant vacation~

stianlagstad commented 6 years ago

Which version of STAR-Fusion are you using? And could you paste the command you're running STAR-Fusion with?

Note to self: I might have to support two output formats for STAR-Fusion, ref https://github.com/STAR-Fusion/STAR-Fusion/issues/19

zhixue commented 6 years ago

I am a novice.The STAR-Fusion had been removed from the server(I borrowed) and I checked up the version in logs but not found. my command is(single-end fastq data, I using --left_fq as tutorial says, without --chimeric_junction)

STAR-Fusion --genome_lib_dir ~/data/GRCh37_gencode_v19_CTAT_lib_Nov012017/ctat_genome_lib_build_dir --left_fq sample.fastq --output_dir star_fusion_outdir

I have seen the link of ref STAR-Fusion/STAR-Fusion#19 which is similar to what I faced, but STAR-Fusion I used should be the most recent version because it was installed by command from github in this month(2017 Dec 6). I also check the . FFPM file,which is as following

FusionName | JunctionReadCount | SpanningFragCount | SpliceType | LeftGene | LeftBreakpoint | RightGene | RightBreakpoint | JunctionReads | SpanningFrags | LargeAnchorSupport | LeftBreakDinuc | LeftBreakEntropy | RightBreakDinuc | RightBreakEntropy | FFPM

Can I just copy the value from FFPM to J_/S_FFPM by myself and do the next stage?

zhixue commented 6 years ago

Oh,sorry. I used command to download the STAR-Fusion from github in my group’s server but failled to apply memory when it was running(the machine with 31GB). But I installed the STAR-Fusion in the server I borrowed with the conda command from a mirror in our country:

conda install STAR-Fusion

I couldn't find the installed path of STAR-Fusion yesterday and the lib data had been deleted, I check the version again today:

version report version (1.1.0)

I am sorry that I have forgot some details about the problem and it is a mistake by myself. I will try to get the newest version of STAR-Fusion and run samples again. Thank you~

stianlagstad commented 6 years ago

Hey, not worries at all. Let me know which version you're on when you have the chance to run the samples again. And let me know what the output looks like then.

zhixue commented 6 years ago

I have run it again. In my last line of nohup.out:

  • STAR-Fusion complete. See output: star-fusion.fusion_candidates.final (or .abridged version)

I only found "star-fusion.fusion_predictions(.abridged).tsv" which format is same as STAT-Fusion/wiki

I check the version:

~/tools/STAR-Fusion-v1.2.0/STAR-Fusion --version

STAR-Fusion version: 1.2.0

nohup STAR-Fusion --genome_lib_dir my_lib_download_dirtionary \ --left_fq sample1.fq \ --output_dir star_fusion_sample1_outdir &

head -1 star_fusion_sample1_outdir/star-fusion.fusion_predictions.tsv

FusionName JunctionReadCount SpanningFragCount SpliceType LeftGene LeftBreakpoint RightGene RightBreakpoint JunctionReads SpanningFrags LargeAnchorSupport LeftBreakDinuc LeftBreakEntropy RightBreakDinuc RightBreakEntropy FFPM


Abridged file is the same as it.

lkalbert001 commented 6 years ago

Today I also meet the same problem like @zhixue said. I get the screenshot,and what I used is the docker image "trinityctat/ctatfusion". What can I do to fix the problem. Please.

screenshot: this

stianlagstad commented 6 years ago

Thank you both for helping me make chimeraviz better! :) Which version of STAR-Fusion are you using, @lkalbert001?

lkalbert001 commented 6 years ago

Thanks for your reply! And what I used is below:

STAR-Fusion : 1.2.0 STAR :2.5.3a samtools: 1.3.1 Trinity: v2.5.1 gmap: 2017-10-30

stianlagstad commented 6 years ago

Thank you. Unfortunately I don't have much time this week, but I will fix this as soon as I'm able. Until then you'll be able to make it work by just removing one of the columns (J_FFPM or S_FFPM) and renaming the remaining one to FFPM. Or you could use this importer instead:

#' Import results from a STAR-Fusion run into a list of Fusion objects.
#'
#' A function that imports the results from a STAR-Fusion run, typically from
#' a star-fusion.fusion_candidates.final.abridged file, into a list of Fusion
#' objects.
#'
#' @param filename Filename for the STAR-Fusion
#' star-fusion.fusion_candidates.final.abridged results file.
#' @param genomeVersion Which genome was used in mapping (hg19, hg38, etc.).
#' @param limit A limit on how many lines to read.
#'
#' @return A list of Fusion objects.
#'
#' @examples
#' starfusionData <- system.file(
#'   "extdata",
#'   "star-fusion.fusion_candidates.final.abridged.txt",
#'   package = "chimeraviz")
#' fusions <- importStarfusion(starfusionData, "hg19", 3)
#' # This should import a list of 3 fusions described in Fusion objects.
#'
#' @export
importStarfusion <- function (filename, genomeVersion, limit) {

  # Is the genome version valid?
  validGenomes <- c("hg19", "hg38", "mm10")
  if (is.na(match(tolower(genomeVersion), tolower(validGenomes)))) {
    stop("Invalid genome version given")
  }

  # If the limit is set, is the value valid?
  if (missing(limit) == FALSE) {
    if (is.numeric(limit) == FALSE || limit <= 0) {
      stop("limit must be a numeric value bigger than 0")
    }
  }

  # Try to read the fusion report
  report <- withCallingHandlers(
    {
      col_types_starfusion = readr::cols_only(
        "#FusionName" = col_skip(),
        "JunctionReadCount" = col_integer(),
        "SpanningFragCount" = col_integer(),
        "SpliceType" = col_skip(),
        "LeftGene" = col_character(),
        "LeftBreakpoint" = col_character(),
        "RightGene" = col_character(),
        "RightBreakpoint" = col_character(),
        "LargeAnchorSupport" = col_character(),
        "LeftBreakDinuc" = col_character(),
        "LeftBreakEntropy" = col_number(),
        "RightBreakDinuc" = col_character(),
        "RightBreakEntropy" = col_number(),
        "FFPM" = col_number()
      )
      if (missing(limit)) {
        # Read all lines
        readr::read_tsv(
          file = filename,
          col_types = col_types_starfusion
        )
      } else {
        # Only read up to the limit
        readr::read_tsv(
          file = filename,
          col_types = col_types_starfusion,
          n_max = limit
        )
      }
    },
    error = function(cond) {
      message(paste0("Reading ", filename, " caused an error: ", cond[[1]]))
      stop(cond)
    },
    warning = function(cond) {
      message(paste0("Reading ", filename, " caused a warning: ", cond[[1]]))
      warning(cond)
    }
  )

  # Set variables
  id                    <- NA
  inframe               <- NA
  fusionTool            <- "starfusion"
  spanningReadsCount    <- NA
  splitReadsCount       <- NA
  junctionSequence      <- NA
  fusionReadsAlignment  <- NA

  # List to hold all Fusion objects
  fusionList <- vector("list", dim(report)[1])

  # Iterate through each line in the .tsv file
  for (i in 1:dim(report)[1]) {

    # Import starfusion-specific fields
    fusionToolSpecificData <- list()
    fusionToolSpecificData[["LargeAnchorSupport"]] = report[[i, "LargeAnchorSupport"]]
    fusionToolSpecificData[["LeftBreakDinuc"]] = report[[i, "LeftBreakDinuc"]]
    fusionToolSpecificData[["LeftBreakEntropy"]] = report[[i, "LeftBreakEntropy"]]
    fusionToolSpecificData[["RightBreakDinuc"]] = report[[i, "RightBreakDinuc"]]
    fusionToolSpecificData[["RightBreakEntropy"]] = report[[i, "RightBreakEntropy"]]
    fusionToolSpecificData[["FFPM"]] = report[[i, "FFPM"]]

    # id for this fusion
    id <- as.character(i)

    leftBreakPoint <- unlist(strsplit(report[[i, "LeftBreakpoint"]], split=":"))
    rightBreakPoint <- unlist(strsplit(report[[i, "RightBreakpoint"]], split=":"))

    # Chromosome names
    chromosomeA <- leftBreakPoint[1]
    chromosomeB <- rightBreakPoint[1]

    # Breakpoints
    breakpointA <- as.numeric(leftBreakPoint[2])
    breakpointB <- as.numeric(rightBreakPoint[2])

    # Strand
    strandA <- leftBreakPoint[3]
    strandB <- rightBreakPoint[3]

    # Number of supporting reads
    splitReadsCount <- report[[i, "JunctionReadCount"]]
    spanningReadsCount <- report[[i, "SpanningFragCount"]]

    # STAR-Fusion doesn't provide the fusion sequence
    junctionSequenceA <- Biostrings::DNAString()
    junctionSequenceB <- Biostrings::DNAString()

    # Gene names
    geneNames1 <- unlist(strsplit(report[[i, "LeftGene"]], split="\\^"))
    geneNames2 <- unlist(strsplit(report[[i, "RightGene"]], split="\\^"))
    nameA <- geneNames1[1]
    nameB <- geneNames2[1]

    # Ensembl ids
    # ensemblIdA <- geneNames1[2]
    # ensemblIdB <- geneNames2[2]

    # Until we have example data we are sure about, don't set ensemblIds:
    ensemblIdA <- NA_character_
    ensemblIdB <- NA_character_

    # PartnerGene objects
    geneA <- new(Class = "PartnerGene",
                 name = nameA,
                 ensemblId = ensemblIdA,
                 chromosome = chromosomeA,
                 breakpoint = breakpointA,
                 strand = strandA,
                 junctionSequence = junctionSequenceA,
                 transcripts = GenomicRanges::GRangesList())
    geneB <- new(Class = "PartnerGene",
                 name = nameB,
                 ensemblId = ensemblIdB,
                 chromosome = chromosomeB,
                 breakpoint = breakpointB,
                 strand = strandB,
                 junctionSequence = junctionSequenceB,
                 transcripts = GenomicRanges::GRangesList())

    fusionList[[i]] <- new(Class = "Fusion",
                           id = id,
                           fusionTool = fusionTool,
                           genomeVersion = genomeVersion,
                           spanningReadsCount = spanningReadsCount,
                           splitReadsCount = splitReadsCount,
                           fusionReadsAlignment = Gviz::AlignmentsTrack(),
                           geneA = geneA,
                           geneB = geneB,
                           inframe = inframe,
                           fusionToolSpecificData = fusionToolSpecificData)
  }

  # Return the list of Fusion objects
  fusionList
}

Could you also run sessionInfo() and paste the result here, @lkalbert001?

lkalbert001 commented 6 years ago

Thank you for helping me. I will try the later method. And I hope it work. Thank you again.

What sessionInfo() shows looks like this:

And there are other contents like this.

stianlagstad commented 6 years ago

Hey @lkalbert001, did that importer work for you?

I'm still not sure what the output format from STAR-Fusion should be. If chimeraviz should support both FFPM, and the case where you get both J_FFPM and S_FFPM. Ref https://github.com/STAR-Fusion/STAR-Fusion/issues/19#issuecomment-355812681

lkalbert001 commented 6 years ago

Hi stianlagstad ,I just copy your code into my R script ,but it‘s Error.

chimeraviz_problem

The result file produced by the latest STAR-Fusion(1.2.0)has FFPM column, but the 1.0.0 version has S-FFPM and J-FFPM .So I think chimeraviz should support both format ,because we don't which version STSR-Fusion do bioinformatician prefer to use.

Look forward to your new chimeraviz .

Best wishes for you !

stianlagstad commented 6 years ago

I'm going to update chimeraviz so that it supports the most recent version of STAR-Fusion. For now, if you need to work with an older version of its output, you will have to write your own import-function. I will be glad to assist if anyone needs help with that.