morris-lab / BiddyetalWorkflow

This repository contains our CellTag workflow, as deployed in our 2018 Biddy et al., Nature paper.
30 stars 11 forks source link

rows which exceeds .Machine$integer.max Error generating cell barcode x cell tag matrix #3

Closed lb15 closed 5 years ago

lb15 commented 5 years ago

Hello!

I've created the parsed celltag list for both v1 and v3 libraries. When I try to run matrix.count.celltags.R, I get the following error for the v1 CellTag library, but not the V3:

Error in CJ(1:178363, 1:35378) : 
  Cross product of elements provided to CJ() would result in 6310126214 rows which exceeds .Machine$integer.max == 2147483647
Calls: dcast -> dcast.data.table -> do.call -> CJ

The v1 parsed file is much larger than the v3, and given the error, I suspect it has to do with the size? I have two samples, one with ~18,000 cells and one with ~6,000 cells and they both throw this error for the V1 library.

Have you seen this error with larger datasets?

In case it is relevant, I also get this warning about data.table:

Warning message:
package ‘data.table’ was built under R version 3.5.2 

I'm using R 3.5.1.

Thanks!

Sincerely, Lauren

sam-morris commented 5 years ago

Hi Lauren! First, we recommend using the CellTagR pipeline, here: https://github.com/morris-lab/CellTagR. Second, given the large size of your row number, are you using the raw matrix here rather than the filtered matrix?

lb15 commented 5 years ago

Oh cool, I did not see that other repository - I will try that. I am using the filtered barcode list, not the raw list. Actually, my second sample has a total of ~3200 barcodes, so it's a relatively small dataset. I'm surprised it would throw that error.

I will see if the other pipeline solves the issue. thanks!

sam-morris commented 5 years ago

Yeah, it should easily handle that number of cells. If there is still a problem, open an issue in the other repo and we'll see you on the other side!

lb15 commented 5 years ago

I was able to produce the cellbarcode x celltag matrix for one of my sample's v1 libraries with the CellTagR package! I'll run the other (larger) sample next, but think this has been solved so will close the issue on this repo.

thanks!

babiddy commented 5 years ago

Hi Lauren,

Thanks for the feedback! I am glad using the CellTagR package has worked for one of your samples. If you run into problems with the larger sample, I think a simple workaround would be to filter the celltag.parsed.tsv file. This is the file used as input for the matrix.count.celltags.R script. If you run into problems with the larger sample reopen the issue and we can help with the filtering of this file.

Best, Brent