immunomind / immunarch

🧬 Immunarch: an R Package for Fast and Painless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires
https://immunarch.com
Apache License 2.0
306 stars 65 forks source link

vis(pubRepStatistics()) issue: max 5 groups and incorrect data #76

Open andrewni4313 opened 4 years ago

andrewni4313 commented 4 years ago

🐛 Bug

I am unable to visualize intersections between more than 5 groups/samples using vis() on pubRepStatistics(). I have 13 groups but vis() only displays 5.

To Reproduce

Steps to reproduce the behavior:

  1. Do the following on data with at least 5 samples:
    pr <- pubRep(immdata$data, .verbose=FALSE)
    vis(pubRepStatistics(pr))
    print(pubRepStatistics(pr))

    and compare graph vs. print

image

Group<chr>                                                                   Count <int>
CD4_Th1_B16 IL33 Tumor&CD4_Th17_B16 IL33 Tumor  6
CD4_Th1_B16 IL33 Tumor&CD4_Th2_B16 IL33 Tumor   8
CD4_Th1_B16 IL33 Tumor&CD4_Th2_B16 Tumor    2
CD4_Th1_B16 IL33 Tumor&CD4_THex_B16 IL33 Tumor  2
CD4_Th1_B16 IL33 Tumor&CD4_THex_B16 IL33 Tumor&CD4_Th17_B16 IL33 Tumor  3
CD4_Th1_B16 Tumor&CD4_Th1_B16 Tumor&CD4_Th2_B16 Tumor   2
CD4_Th1_B16 Tumor&CD4_Th1_B16 Tumor&CD4_Th2_B16 Tumor&CD4_THex_B16 Tumor    3
CD4_Th1_B16 Tumor&CD4_Th2_B16 Tumor 8
CD4_Th1_B16 Tumor&CD4_Th2_B16 Tumor&CD4_THex_B16 Tumor  6
CD4_Th1_B16 Tumor&CD4_THex_B16 Tumor    10
CD4_Th1_B16 Tumor&CD4_THnaive_B16 Tumor 2
CD4_Th2_B16 Tumor&CD4_THex_B16 Tumor    4

As you can see there are many more intersections printed than displayed and more than 5 unqiue groups. For example, CD4_THnaive_B16 Tumor is not included despite having an intersection with CD4_Th1_B16 Tumor according to the printed results.

Expected behavior

It should follow what upset normally does and expand to any number of groups. Example from upset: image

Additional context

On further review, I'm not sure if any the displayed data is right at all. For example, the printed results shows the intersection between CD4_Th1_B16 Tumor&CD4_Th2_B16 Tumor as 8 but the graph shows 10.

Additionally, there are some weird results even from the print statement: CD4_Th1_B16 Tumor&CD4_Th1_B16 Tumor&CD4_Th2_B16 Tumor 2 This does account for the 10 in the graph if it is accurate though.

vadimnazarov commented 4 years ago

Hi @andrewni4313

Thank you so much for reporting on this! When you experimented, did you apply UpSetR plots on the output of pubRep without using the vis function? If not, can you try to do this, please, and see, if the output is correct? We can figure out the location of the error (either UpSetR or immunarch visualisations), and fix it or re-write the UpSetR plot function.

andrewni4313 commented 4 years ago

Wow, stumbled upon the solution by some sheer dumb luck when trying to play around with the UpSetR plot function. I don't think UpSetR explicitly mentions it anywhere (at least I didn't see it) but it seems to automatically limit the number of sets to 5.

Adding nsets fixes the issue: upset(fromExpression(expression), nsets=length(expression)) image

So vis() does work but you have to pass nsets in: vis(pubRepStatistics(pr), nsets=20)

I would suggest making nsets default to # of intersections to avoid this issue. This works for me:

expression <- deframe(print(pubRepStatistics(pr)))   # named vector of intersections
upset(fromExpression(expression), nsets=length(expression))
vadimnazarov commented 4 years ago

Hi @andrewni4313

Thank you for the insight! Didn't see it all personally... Our team will update the package and I will let you know about the progress.