scottzijiezhang / MeRIPtools

8 stars 13 forks source link

adjustExprLevel assumes joint peak name attribute is a character #9

Open rhughwhite opened 4 years ago

rhughwhite commented 4 years ago

Dear authors,

I ran into an issue with the adjustExprLevel() function.

The line below assumes the $name attribute is a character. 586 gene.size.factor <- gene.size[jointPeak(object)$name,]` In my data this attribute is a factor variable, which caused the wrong genes to be accessed from gene.size. The presence of the factor rather than character variable may be due to how I created my 'MeRIP' object - using the MeRIP() function rather than countReads(), but it could be safer to have a check if the name attribute is the correct type.

Thanks!

scottzijiezhang commented 3 years ago

I tested changing the "jointPeak(object)$name" to factor instead of character and run the adjustExprLevel() function. It still works. I suspect the $name attribute, which is the gene name column of the BED12 format table, wasn't really the reason causing the issue as both factor and character should work in this line of code.

Maybe there are other reasons you run into the issue. More details about your data would be helpful. for example, what does your peak table look like jointPeak(object)?

By the way, regarding your speculation: " The presence of the factor rather than character variable may be due to how I created my 'MeRIP' object - using the MeRIP() function rather than countReads(), but it could be safer to have a check if the name attribute is the correct type." jointPeak(object)$name is trying to pull the name column of the peak table jointPeak(object) or object@jointPeaks. This table was created by function reportJointPeak(). So how you created the MeRIP object doesn't affect the data type of the jointPeaks$name attribute.

rhughwhite commented 3 years ago

Thanks for your help Scott.

adjustExprLevel() works without throwing an error whether jointPeak(object)$name is a factor or character, but the outputs are differ for me.

I think it's due to this line:

gene.size.factor <- gene.size[jointPeak(object)$name,]

If $name is a factor here it is cast to a numeric (at least the behaviour is as such). If the rows of the gene-level data in geneExpression(object) (gene.size above) are in the same order as the jointPeak(object)$name, then it doesn't make a difference if $name is a character or factor. However in my case the ordering of these two data.frames is not consistent, and so the incorrect genes are accessed from gene.size.

Here is a look at the output of head(jointPeak(object))

chr start end name score strand thickStart 1 chrX 100628488 100628587 ENSG00000000003.15 0 - 100628488 2 chr20 50935051 50935150 ENSG00000000419.12 0 - 50935051 3 chr20 50942088 50955201 ENSG00000000419.12 0 - 50942088 4 chr1 169850429 169850528 ENSG00000000457.14 0 - 169850429 5 chr1 169850629 169850827 ENSG00000000457.14 0 - 169850629 6 chr1 169852025 169852224 ENSG00000000457.14 0 - 169852025 thickEnd itemRgb blockCount blockSizes blockStarts 1 100628587 0 1 99 0 2 50935150 0 1 99 0 3 50955201 0 5 39,26,77,34,16 0,3649,3759,6541,13098 4 169850528 0 1 99 0 5 169850827 0 1 198 0 6 169852224 0 1 199 0

Rupert

scottzijiezhang commented 3 years ago

Hi Rupert, Thanks for the info. I now changed the code to force this attribute to character. I hope this can fix the issue.

Please let me know if the problem still persist.

Thanks Scott