Closed majorkazer closed 4 years ago
Hello @majorkazer ,
can you email us a sample of your data so we can reproduce the bug please? support@immunomind.io
Can you try running the code
j_usage = geneUsage(BCRdata.coding$data, "hs.ighj", .ambig = "inc")
and see what changed?
Hi @vadimnazarov,
Thanks for your help. My dataset is huge, so I have attached a downsampled dataset here.
I tried running the code you suggested, but still get the same result as described above.
Hi @majorkazer ,
it seems that everything works OK on that version of immunarch:
Do you have the tcR
package installed? Can you remove both tcR
and immunarch
packages and install immunarch
(only!) again, and try that again?
ping @majorkazer
Apologies for the delay.
I am running this analysis in a jupyter notebook runtime environment; I'm not sure if this may be part of the problem. I created a new notebook runtime environment and re-installed immunarch
from github using:
devtools::install_url("https://github.com/immunomind/immunarch/raw/master/immunarch.tar.gz")
I still get the same result as I reported before (see attached image from notebook).
Hi @majorkazer
We run the exact code you just wrote us on your data, and got correct results for all three gene segments. Can you please run the following code and check the output:
table(BCRdata.coding$data$J.name)
table(BCRdata.coding$data$D.name)
Hi,
Running either table function returns this: < table of extent 0 >
.
However, if I run table(BCRdata.coding$data$clones$D.name)
, I get what we would expect:
I don't have a way to test this locally as of right now, everything is being run on the runtime environment... I think there may be something weird going on with the environment and/or jupyter notebooks. Nevertheless, for the time being I can just use the outputs from table
to generate my own frequency plots.
Thanks for your help with this! Happy to close this issue for the time being unless you want to explore further.
@majorkazer
after brainstorming with almost the whole team we managed to reproduce this tricky bug. Can you please try
j_usage = geneUsage(BCRdata.coding$data[[1]], "hs.ighj", .norm=T, .ambig="inc")
vis(j_usage, .plot = "hist")
Note the [[1]]
for the data
.
Alternatively, you can try
j_usage = geneUsage(BCRdata.coding$data[["clones"]], "hs.ighj", .norm=T, .ambig="inc")
vis(j_usage, .plot = "hist")
Success! I am able to generate the J and D frequency plots using your code.
So, out of curiosity, is this something weird with how MIXCR objects are brought in? Or a difference in the version of immunarch
that I'm using?
Thanks for your help!
Perfect!
No, this is an issue in our code. geneUsage
incorrectly processes the case when there is only one repertoire in the input list of repertoires. When you pass BCRdata.coding$data
you pass a list with only one repertoire (i.e., the problematic case). When you pass BCRdata.coding$data[["clones"]]
you pass this exact repertoire, and this is the case where everything works OK. Interestingly enough, when you pass a list with two or more repertoires, everything works OK, because we have a separate branch of code that processes the case with only one repertoire, and we definitely need to improve it.
@majorkazer , thank you for bringing this very tricky and non-obvious issue to us! We will fix it in our next release, and I will close the issue then.
We are conducting short interviews with specialists about pain points in AIRR analysis. We would like to learn more about the full pipeline from laboratory work to clinical insights. The interview is also a wonderful opportunity to discuss anything related to immune repertoire analysis and how to make it in immunarch and R. Should you find it appealing, here is a link to schedule our appointment: https://calendly.com/immunomind/interview
The issue was solved.
Sorry to open this back up but i'm having the same problem and the [[1]] does not fix it. I don't have tcR installed and am using immunarch directly in R. My data was loaded in from mixcr and it seems that everything else works great except this function.
specifying hs.trab. or hstrbj ect all returns the same object with all clones including IGH, IGK.
using the [[1]] selects clones but it seems to be doing it in a strange way. for example hs.trav will return any clone that has a V in the name IGHV3, TRAV11, IGKV2D. Same thing with hs.trab and hs.traj
🐛 Bug
I am trying to run geneUsage on a MiXCR dataset that I imported consisting only of IGH chains. I can run it to generate a histogram of v gene usage (using "hs.ighv") but when I try to look at other genes, it only can pull the ighv genes.
To Reproduce
The following will produce a histogram of v genes.
v_usage = geneUsage(BCRdata.coding$data, "hs.ighv", .norm = TRUE, .ambig = "mag")
vis(v_usage, .plot = "hist")
However, the following attempts to generate d and j gene histograms just return the same v gene histogram.
d_usage = geneUsage(BCRdata.coding$data, "hs.ighd", .norm = TRUE, .ambig = "mag")
vis(d_usage, .plot = "hist")
orj_usage = geneUsage(BCRdata.coding$data, "hs.ighj", .norm = TRUE, .ambig = "mag")
vis(j_usage, .plot = "hist")
Looking at d_usage or j_usage themselves, they are actually just the same as v_usage, a table of the v genes in my dataset. So it seems like geneUsage is having trouble pulling the correct genes from my dataset.
Additional context
sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.3 LTS
other attached packages: [1] cowplot_1.0.0 openxlsx_4.1.3 immunarch_0.5.2 gridExtra_2.3
[5] data.table_1.12.6 dtplyr_1.0.0 dplyr_0.8.3 ggplot2_3.2.1