Open bepoli opened 5 years ago
Hi @benplm
Thank you for your interest in QAPA and for reporting this incompatibility with data.table 1.12.2. It's not clear to me what change in data.table is causing this issue. If possible, can you e-mail me (k.ha -at- mail.utoronto.ca) a some sample count files that I can use to try to replicate the problem?
Closing this issue as I have been unable to reproduce any error related to data.table 1.12.2. If it is still a problem, feel free to reopen.
Hello, I just want to report an incompatibility with the latest version of data.table (recently published in CRAN). Using data.table=1.12.0, I usually get a stderr like this when computing the PAU values from Salmon counts:
[qapa] Version 1.2.1 Merging samples by TPM |======================================================================| 100% Separating Ensembl IDs Adding Ensembl metadata Found 76575 / 76575 (100%) matches Warning messages: 1: In `[.data.table`(df, , `:=`(c("Transcript", "Gene", "Species", : Supplied 9 columns to be assigned a list (length 11) of values (2 unused) 2: In eval(jsub, SDenv, parent.frame()) : NAs introduced by coercion 3: In eval(jsub, SDenv, parent.frame()) : NAs introduced by coercion 4: In eval(jsub, SDenv, parent.frame()) : NAs introduced by coercion Finished merging data Melting data frame Operating on forward strand Calculating Poly(A) Usage 1714416 rows, 8046 genes Operating on reverse strand Calculating Poly(A) Usage 1654840 rows, 7904 genes Adding input expression values Finished computing PAU! [qapa] Finished!
and I get a NA value where the count of all the isoforms is zero in a given sample.
However, since data.table version 1.12.2, this behaviour changed:
Merging samples by TPM |======================================================================| 100% Separating Ensembl IDs Error in `[.data.table`(df, , `:=`(c("Transcript", "Gene", "Species", : Supplied 9 columns to be assigned 11 items. Please see NEWS for v1.12.2. Calls: separate_ensembl_field -> [ -> [.data.table Execution halted Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input Calls: data.table -> read.csv -> read.table Execution halted [qapa] Version 1.2.1 [qapa] Finished!
The execution is halted and the resulting output file is empty. Also, it's worth mentioning that
qapa quant
still returns zero-exit status (so it won't halt a pipeline running in the background).Have a good day
hi my friend have U solve this problem?
Hi Kevin,
I get the exact same error and it does seem to be related to the R-data.table update starting from v1.12.2. Please see the new features for v1.12.2 (https://cran.r-project.org/web/packages/data.table/news/news.html).
I am pasting the error below. Thanks!
Merging samples by TPM |======================================================================| 100% Separating Ensembl IDs Error in
[.data.table(df, ,
:=(c("Transcript", "Gene", "Species", : Supplied 9 columns to be assigned 11 items. Please see NEWS for v1.12.2. Calls: separate_ensembl_field -> [ -> [.data.table Execution halted qapa.qapa - 2022-07-21 10:55:27,728 - INFO - compute_pau.R -e intermediate.txt Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input Calls: data.table -> read.csv -> read.table Execution halted qapa.qapa - 2022-07-21 10:55:28,040 - INFO - Finished!
I have got the same error message:
Merging samples by TPM
|======================================================================| 100%
Separating Ensembl IDs
Error in `[.data.table`(df, , `:=`(c("Transcript", "Gene", "Species", :
Supplied 9 columns to be assigned 11 items. Please see NEWS for v1.12.2.
Calls: separate_ensembl_field -> [ -> [.data.table
Execution halted
qapa.qapa - 2022-12-22 14:11:58,524 - INFO - compute_pau.R -e /tmp/qapa_merge_z1752tja
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
Calls: data.table -> read.csv -> read.table
Execution halted
qapa.qapa - 2022-12-22 14:11:59,037 - INFO - Finished!
and the final file is empty.
Hi, to date I still don't have a good grasp on this issue and don't have much bandwidth to investigate. It seems to affect a very small number of users. If you are able to, could you e-mail me (k
quant.sf
files. If you like you can remove the actual TPM values, I am only interested in the sequence ID column.Also what version of data.table
do you have installed?
Hi, I have sent you an email about the details. @kcha
@NJU-Bio-Info, thanks for sending your files. I think I found the cause. It looks like for human there were a handful of genes on chrY that had underscores in the Ensembl version string:
ENST00000381657_ENSG00000182378.15_PAR_Y_hsa_chrY_299096_303356_+_utr_299335_303356(+)
ENST00000432318_ENSG00000198223.17_PAR_Y,ENST00000494969_ENSG00000198223.17_PAR_Y,ENST00000355432_ENSG00000198223.17_PAR_Y,ENST00000381529_ENSG00000198223.17_PAR_Y_hsa_chrY_1309401_1309921_+_utr_1309868_1309921(+)
ENST00000331035_ENSG00000185291.12_PAR_Y_hsa_chrY_1382390_1382685_+_utr_1382465_1382685(+)
ENST00000313871_ENSG00000197976.12_PAR_Y_hsa_chrY_1600658_1602514_+_utr_1601594_1602514(+)
ENST00000262640_ENSG00000124333.16_PAR_Y,ENST00000286448_ENSG00000124333.16_PAR_Y_hsa_chrY_57128402_57130289_+_utr_57128659_57130289(+)
ENST00000381401_ENSG00000169100.14_PAR_Y_hsa_chrY_1386151_1386759_-_utr_1386151_1386601(-)
This was unexpected and the extra underscores like .15_PAR_Y
caused QAPA's string parsing to fail. To get around this, you should use Ensembl Gene IDs without version numbers, which is what QAPA expects.
In the meantime as a quick solution, I suggest removing these entries from your quant files entirely. For example:
grep -v "_PAR_" quant.sf > quant2.sf
Then try qapa quant
again.
In summary: the issue is not due to data.table versions, but rather unexpected inclusion of underscores in version IDs. QAPA expects Ensembl Gene IDs without the version number.
Hello, I just want to report an incompatibility with the latest version of data.table (recently published in CRAN). Using data.table=1.12.0, I usually get a stderr like this when computing the PAU values from Salmon counts:
and I get a NA value where the count of all the isoforms is zero in a given sample.
However, since data.table version 1.12.2, this behaviour changed:
The execution is halted and the resulting output file is empty. Also, it's worth mentioning that
qapa quant
still returns zero-exit status (so it won't halt a pipeline running in the background).Have a good day