Closed nnalpas closed 6 years ago
Package bigmemory can be used to store numeric matrices.
Seems like you have some strings in your data, which is not currently handled by big.matrix
objects.
Likewise, I'm not sure using type = "character"
is possible.
How big is your data? You could maybe read it by chunks with data.table::fread
and store the numeric columns in a big.matrix
and the other information in a character matrix.
Hello, yes, I noticed afterwards that only numeric are allowed, my bad. Even though the error would arise if you have a numeric matrix stored in file, with empty values in the last column just after the header, such as: col1\tcol2\tcol3\tcol4 1\t2\t\t 1\t2\t3\t4
Anyway, I think it could be just my data that is messy and the fix I suggested might not be useful for other people. You're right I will just stick with fread for the time being.
Thanks again. Best regards.
Hello,
I encountered the following error while using read.big.matrix:
> evid_bm <- read.big.matrix(filename = "L:/Data/SQL_Maxquant/2018_02/P193-U1-20180204/evidence.txt", sep = "\t", header = TRUE, has.row.names = TRUE, type = "character", backingfile = "evid.bck", descriptorfile = "evid.desc", shared = options()$bigmemory.default.shared)
'Error in read.big.matrix(filename = "L:/Data/SQL_Maxquant/2018_02/P193-U1-20180204/evidence.txt", : Dimension mismatch between header row and first data row.'I then checked my file, which I know does not have rownames but has empty values towards the last column. Therefore the first line of my file (after header) will look like this:
> firstLine <- scan(file = "L:/Data/SQL_Maxquant/2018_02/P193-U1-20180204/evidence.txt", what = "character", skip = 1, nlines = 1, sep = "\n")
> firstLine
'[1] "AAAAAAEGIEAAEK\t14\tUnmodified\tAAAAAAEGIEAAEK\t\t\t0\t0\t0\tMC-0-1_GL0042895;44_gene_id_58438\tMC-0-1_GL0042895\tMC-0-1_GL0042895\t\t\tMULTI-MSMS\t20170103_VA_MetaDS_R114\tR114\t636.8281\t2\t636.825149\t1271.63574\t38214.52\t3.7017\t0.0023573\t0.099839\t6.358E-05\t3.8015\t0.0024209\t636.824962634561\t23.618\t0.30947\t23.618\t23.477\t23.786\t0\t\t\t\t\t32\t18\t2\t0\t0\t0\t0.00014828\t1\t13839\t92.939\t60.644\t1\t13447000\t\t\t0\t19061\t0\t0\t0\t0\t\t"'My understanding is that the function
read.big.matrix()
then usesstrsplit()
to parse this line, however in the case of the line above the last column value (which is empty) will be ignored, resulting in a lower number of values compare to header and raising the error.Possibly an alternative to
base::strsplit()
would bestringr::str_split()
, which does not remove empty value towards the end. Any chance this could be implemented or any other alternative that supports last column being partially empty?Best regards, Nicolas