omarwagih / ggseqlogo

Publication-quality sequence logos in R
208 stars 30 forks source link

Using different TSD lengths as facetting variable #11

Closed mobilegenome closed 5 years ago

mobilegenome commented 6 years ago

Hi,

I have a dataset with TSDs of different lengths and wanted to create sequence logo for each length bin. I think it would be very handy to use the ggplot's facet functionality to do this. However, with the current implementation this raises an error for different sequence length. Error in letterMatrix(seqs) : Sequences in alignment must have identical lengths

I am not sure how difficult it is to implement this behaviour but it would help tremendously to explore heterogeneous datasets, as from TE calling.

Best

Fritjof

This is my data:

head(df)
TSD<chr> TSD_length <int>
1   TAAAAATAAAGTCCT 15      
2   AAAAGATTTGTGCAG 15      
3   TGGGGGGACATTTTT 15      
4   CCATTCTGATTTTTTT    16      
5   ACAGGGAAAGGTTTTT    16      
6   AAAAAGTGTGCTGGAGG   17

And my ggplot call:

p <- ggplot(df.pass.tsdlength.test)
p + geom_logo(data = df.pass.tsdlength.test$TSD, seq_type = "dna" ) +
  theme_logo() + 
  facet_wrap( ~ TSD_length)
malcook commented 6 years ago

I think the underlying issue is that ggseqlogo doesn't work as you might expect it to.

In fact, it does not even require passing data to ggplot.

Try this as a workaround:

ggseqlogo(with(df,split(TSD,sprintf('TSD_length=%s',TSD_length))))