Open chenyx47 opened 4 years ago
Hi @chenyx47
Thank you! In the package we addressed the common case of repertoire files where all columns are available. Would you be willing to tell me more about the reasons to strip some columns from the output? It will greatly help us improve the package and/or provide recommendations how to deal with this type of bug.
Thanks for your reply, but you might misunderstand my question. The question is that when I use repLoad(/path/to/mixcrclonesoutputfile.txt), I met this error.
Error in strsplit(df[[.dalignments]], "|", TRUE, FALSE, TRUE) : non-character argument
Thus, I recheck my input data, and find that only when the input data (Mixcr output format) only one row does the error arise. And I paste an example here to inform you the format of the input data which would lead to the error. But becasue of the space I only paste part of the columns. Actually, all of the columns are available.
cloneId cloneCount cloneFraction targetSequences targetQualities allVHitsWithScore allDHitsWithScore 0 17.0 1.0 TGTGCCAGTAGTATAGACGGTTCATCTGGAAACACCATATATTTT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
In contrast, neither the input data with more than one row nor with zero row would lead to the error. More than one row: cloneId cloneCount cloneFraction targetSequences targetQualities allVHitsWithScore allDHitsWithScore 0 17.0 1.0 TGTGCCAGTAGTATAGACGGTTCATCTGGAAACACCATATATTTT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF 0 17.0 1.0 TGTGCCAGTAGTATAGACGGTTCATCTGGAAACACCATATATTTT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF Zero row: cloneId cloneCount cloneFraction targetSequences targetQualities allVHitsWithScore allDHitsWithScore
Hi @chenyx47
Thank you! It's clear to me now, we will see what we can do. Is this a blocker for your research? It seems that having a file with only one row is unusual.
Hi @vadimnazarov Thanks for your reply. Getting a file with only one row seems weird for me too. It might be attributed to that my analysis was based on bulk-RNAseq data. For now it is a blocker for my research, so I will appreciate it if you could offer me a solution.
Hi, @vadimnazarov
I am also working on bulk-RNA-seq data to find T cells from tumor samples. To me it is usual to have a mixcr result file that has only two lines. After load a sample with repLoad(), I got the same error message as the above researcher.
> input_path <- "C:\\Users\\Andy\\Desktop\\56.filtered.txt"
> immune_data_list <- repLoad(input_path)
== Step 1/3: loading repertoire files... ==
Processing "<initial>" ...
-- Parsing "C:\Users\Andy\Desktop\56.filtered.txt" -- mixcr
Error in strsplit(df[[.dalignments]], "|", TRUE, FALSE, TRUE) :
non-character argument
Following is my two line productive cdr3aa result.
cloneId cloneCount cloneFraction targetSequences targetQualities allVHitsWithScore allDHitsWithScore allJHitsWithScore allCHitsWithScore allVAlignments allDAlignments allJAlignments allCAlignments nSeqFR1 minQualFR1 nSeqCDR1 minQualCDR1 nSeqFR2 minQualFR2 nSeqCDR2 minQualCDR2 nSeqFR3 minQualFR3 nSeqCDR3 minQualCDR3 nSeqFR4 minQualFR4 aaSeqFR1 aaSeqCDR1 aaSeqFR2 aaSeqCDR2 aaSeqFR3 aaSeqCDR3 aaSeqFR4 refPoints
1 2 0.666666666666667 TGTGCTAGTGGTTGGGGGACCTACAATGAGCAGTTCTTC JJJJJJJJJJJJJJJJJJAJJJJJJJJJJJJJJJJJJJJ TRBV12-5*00(296) TRBJ2-1*00(240) TRBC2*00(925) 273|289|310|0|16|ST286G|66.0 22|42|70|19|39||100.0 TGTGCTAGTGGTTGGGGGACCTACAATGAGCAGTTCTTC 32 CASGWGTYNEQFF :::::::::0:-1:16:::::19:-2:39:::
2 1 0.333333333333333 TGTGCCAGCAGTAGGACCCCGACCTACGAGCAGTACTTC JJJJJFJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ TRBV6-5*00(785) TRBJ2-7*00(225) TRBC2*00(295),TRBC1*00(267) 270|282|307|0|12||60.0 22|39|67|22|39||85.0 ; TGTGCCAGCAGTAGGACCCCGACCTACGAGCAGTACTTC 37 CASSRTPTYEQYF :::::::::0:-5:12:::::22:-2:39:::
FYI I also have the same issue where I am running mixcr on bulk, paired end RNA-Seq from tumor samples and have a handful of repertoires with only one clone. This is with extremely hi depths reads btw. immunarch repLoad poops out here for me too.
Did anyone find a solution for this?
Hi, I am having the exact same issue as others in this thread. I have been digging in the code a little to see if I could find a root of the problem. The repLoad function decides which kind of file is being imported and uses different parsers for different data, so the error here is coming from the mixcr_parse() function.
The error happens when the parser determines it is a VDJ recombination type (for example TRBV clones) but mixcr has no info in the D alignment column for all clones.
# check for VJ or VDJ recombination
# VJ / VDJ / Undeterm
recomb_type <- "Undeterm"
if (sum(substr(head(df)[[.vgenes]], 1, 4) %in% c("TCRA", "TRAV", "TRGV", "IGKV", "IGLV"))) {
recomb_type <- "VJ"
} else if (sum(substr(head(df)[[.vgenes]], 1, 4) %in% c("TCRB", "TRBV", "TRDV", "IGHV"))) {
recomb_type <- "VDJ"
}
which is followed by a check for which recomb_type its parsing and in the case for "VDJ" is the offending strsplits
if (recomb_type == "VJ") {
df$VD.insertions <- -1
} else if (recomb_type == "VDJ") {
logic <- sapply(strsplit(df[[.dalignments]], "|", TRUE, FALSE, TRUE), length) >= 4 &
sapply(strsplit(df[[.vend]], "|", TRUE, FALSE, TRUE), length) >= 5
df$VD.insertions[logic] <-
as.numeric(sapply(strsplit(df[[.dalignments]][logic], "|", TRUE, FALSE, TRUE), "[[", 4)) -
as.numeric(sapply(strsplit(df[[.vend]][logic], "|", TRUE, FALSE, TRUE), "[[", 5)) - 1
}
Note that a strsplit like this will split each element in the vector its been given, and it while it will accept some NA elements, it does complain if all elements are NA. Following is some R console testing I just did to confirm and also how to solve it by forcing a cast to character vector:
> strsplit(c("a|b|c|d|", NA), "|", TRUE, FALSE, TRUE)
[[1]]
[1] "a" "b" "c" "d"
[[2]]
[1] NA
> strsplit(c(NA, NA), "|", TRUE, FALSE, TRUE)
Error in strsplit(c(NA, NA), "|", TRUE, FALSE, TRUE) :
non-character argument
> strsplit(as.character(c(NA, NA)), "|", TRUE, FALSE, TRUE)
[[1]]
[1] NA
[[2]]
[1] NA
> str(c(NA, NA))
logi [1:2] NA NA
> str(c("TEST", NA))
chr [1:2] "TEST" NA
> str(as.character(c(NA, NA)))
chr [1:2] NA NA
Finally, note that there are other splits like these further in the parsing code which produce the exact same error of course. Since I don't know the code at all I cant say for sure that this is a good idea but it seems like the solution is to make sure R knows this is a character vector. Perhaps someone can have a look at this again?
Thanks!
having the same issue and would also appreciate it being addressed
Having this issue as well, it would be nice if a solution comes up!:) thank you!
I'm having this issue too. It would be nice to have a solution. Thank you in advance :)
Same issue ongoing for me... Files with one line are rejected
Same issue, files with one line (one clonotype) is rejected.
Hi! I was having the same problem and had a quick look at the code. The following block of code essentially determines the size of DJ insertion from D and J alignment encoding, and the one directly preceding it does the same for VD insertion.
.dj.insertions <- "DJ.insertions"
df$DJ.insertions <- -1
if (recomb_type == "VJ") {
df$DJ.insertions <- -1
} else if (recomb_type == "VDJ") {
logic <- sapply(strsplit(df[[.jstart]], "|", TRUE, FALSE, TRUE), length) >= 4 &
sapply(strsplit(df[[.dalignments]], "|", TRUE, FALSE, TRUE), length) >= 5
df$DJ.insertions[logic] <-
as.numeric(sapply(strsplit(df[[.jstart]][logic], "|", TRUE, FALSE, TRUE), "[[", 4)) -
as.numeric(sapply(strsplit(df[[.dalignments]][logic], "|", TRUE, FALSE, TRUE), "[[", 5)) - 1
}
I suppose -1
means that the insertion is either undefined because of the VJ recombination type, or cannot be determined with high degree of confidence (which is the case when D alignment encoding is missing). So I slightly modified the logic in the above if statements so that the offending strsplits are not executed if all elements in df[[.dalignments]]
are NA
.
if (recomb_type == "VJ" | all(is.na(df[[.dalignments]]))) {
df$VD.insertions <- -1
} else if (recomb_type == "VDJ") {
logic <- sapply(strsplit(df[[.dalignments]], "|", TRUE, FALSE, TRUE), length) >= 4 &
sapply(strsplit(df[[.vend]], "|", TRUE, FALSE, TRUE), length) >= 5
df$VD.insertions[logic] <-
as.numeric(sapply(strsplit(df[[.dalignments]][logic], "|", TRUE, FALSE, TRUE), "[[", 4)) -
as.numeric(sapply(strsplit(df[[.vend]][logic], "|", TRUE, FALSE, TRUE), "[[", 5)) - 1
}
.dj.insertions <- "DJ.insertions"
df$DJ.insertions <- -1
if (recomb_type == "VJ" | all(is.na(df[[.dalignments]]))) {
df$DJ.insertions <- -1
} else if (recomb_type == "VDJ") {
logic <- sapply(strsplit(df[[.jstart]], "|", TRUE, FALSE, TRUE), length) >= 4 &
sapply(strsplit(df[[.dalignments]], "|", TRUE, FALSE, TRUE), length) >= 5
df$DJ.insertions[logic] <-
as.numeric(sapply(strsplit(df[[.jstart]][logic], "|", TRUE, FALSE, TRUE), "[[", 4)) -
as.numeric(sapply(strsplit(df[[.dalignments]][logic], "|", TRUE, FALSE, TRUE), "[[", 5)) - 1
}
I don't know if and how exactly this affects things downstream, but this worked for me and I guess would suffice for the time being. I forked the repo and modified it so if anyone is interested please check the repo here.
It would appear that you have fixed it! I downloaded your forked repo and it works great. Thanks so much and have a great day!
Hi, @plezar! My name is Aleksandr Popov, I am a developer of the Immunarch package.
Thank you very much for this bugfix! I will merge it into dev
branch in the upstream, so it will be included in the next release of Immunarch.
Good luck, Aleksandr
@plezar's fork works for me (thanks!) but the current dev branch in this repo fails with:
Error in tbl_subset2(x, j = i, j_arg = substitute(i)) :
object '.dalignments' not found
Hi, @jdm204! Thank you for using our software!
I've added fix for this error to dev
branch, now it should work as expected.
Best regards, Aleksandr
🐛 Bug
I get this error when I use repLoad(/path/to/mixcrclonesoutputfile.txt).
Error in strsplit(df[[.dalignments]], "|", TRUE, FALSE, TRUE) : non-character argument
I recheck the data and find the error arise when the input data have only one row like this (omiting other columns):
cloneId cloneCount cloneFraction targetSequences targetQualities allVHitsWithScore allDHitsWithScore
0 17.0 1.0 TGTGCCAGTAGTATAGACGGTTCATCTGGAAACACCATATATTTT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
To Reproduce
Steps to reproduce the behavior:
1.repLoad(/path/to/mixcrclonesoutputfile.txt). 2.Error in strsplit(df[[.dalignments]], "|", TRUE, FALSE, TRUE) : non-character argument 3.
Expected behavior
Additional context