wenweixiong / MARVEL

42 stars 9 forks source link

Problem with function AnnotateSJ.10x #30

Closed xujialiu closed 1 year ago

xujialiu commented 1 year ago

There was a mistake in the function AnnotateSJ.10x.

logical <- grepl("chr", gtf$V1[1], fixed = TRUE)

However,

if (logical == FALSE) {
    gtf.small$chr.pos <- paste("chr", gtf$V1, ":", gtf$V4, 
      sep = "")
  } else if (logical == TRUE) {
    gtf.small$chr.pos <- paste(gtf$V1, ":", gtf$V4, sep = "")
  }

The right code should be :

if (logical == TRUE) {
    gtf.small$chr.pos <- paste("chr", gtf$V1, ":", gtf$V4, 
      sep = "")
  } else if (logical == FALSE) {
    gtf.small$chr.pos <- paste(gtf$V1, ":", gtf$V4, sep = "")
  }
wenweixiong commented 1 year ago

logical <- grepl("chr", gtf$V1[1], fixed = TRUE) checks if for the presence of "chr" prefix in column no. 1 of the GTF file.

If the "chr" prefix is not present, i.e., logical == FALSE, then the function adds the "chr" prefix prior to combining the chromosome number with the exon coordinate, i.e., paste("chr", gtf$V1, ":", gtf$V4, sep = "")

But if the "chr" prefix is present , i.e., logical == TRUE, then the function does not add the "chr" prefix prior to combining the chromosome number with the exon coordinate, i.e.,paste(gtf$V1, ":", gtf$V4, sep = "")

xujialiu commented 1 year ago

I created a small Marvel object for testing purposes. https://cfb4bf6422.znas.cn/AppH5/share/?nid=LIYDIMRQGEYTCRKXHBMFQ&code=G3w2Psahf47reog9lWQa2NPbw8qs8OSr84cz718m1fkask9AyrD67hJnSF9Avm1Csa&mode=file&display=list

You can test the function AnnotateSJ.10x using this small object. If using the original function, the sj.metadata was filled with all NA in both gene_short_name.start and gene_short_name.end, and the sj.type was only filled with "start_unknown.gene|end_unknown.gene ". However, if you change the function to:

if (logical == TRUE) {
    gtf.small$chr.pos <- paste("chr", gtf$V1, ":", gtf$V4, 
      sep = "")
  } else if (logical == FALSE) {
    gtf.small$chr.pos <- paste(gtf$V1, ":", gtf$V4, sep = "")
  }

the result should be fine.

wenweixiong commented 1 year ago

This is because the splice junction coordinates in your input file did not contain the "chr" prefix.

1

Please may you add the "chr" prefix to the splice junction coordinates (see feature metadata section of the Splice junction counts section of the tutorial: https://wenweixiong.github.io/MARVEL_Droplet.html).

2

Once this this is fixed, you will find that the function works fine.

3
xujialiu commented 1 year ago

After adding the "chr", the result did seem fine. However, I am confused. The purpose of the logical <- grepl("chr", gtf$V1[1], fixed = TRUE) AND following code

if (logical == FALSE) {
    gtf.small$chr.pos <- paste("chr", gtf$V1, ":", gtf$V4, 
      sep = "")
  } else if (logical == TRUE) {
    gtf.small$chr.pos <- paste(gtf$V1, ":", gtf$V4, sep = "")
  }

isn' t it for the AnnotateSJ.10x function to work not matter whether or not the gtf V1 column has the "chr" prefix or not. It is clearly that the code does not work functionally.

wenweixiong commented 1 year ago

Yes, the purpose of the code snippet is to work on GTF files regardless of whether its first column does or doesn't contain the "chr" prefix. But it does require the "chr" prefix to be present in the splice junction metadata/matrix (Splice junction counts section of the tutorial: https://wenweixiong.github.io/MARVEL_Droplet.html). Unfortunately, the "chr" prefix wasn't present in your splice junction metadata/matrix, and hence, after the prefix was added, the function works fine (see screenshots and details above).