loosolab / multicrispr

0 stars 0 forks source link

Error in data.table #1

Closed templardrake closed 2 years ago

templardrake commented 2 years ago

Hi Aditya et al.

Thank you so much for publishing multicrispr. It's been working really well for our needs! I'm currently running into a bit of a problem with Doench2016 scoring. We are trying to screen for functional vs non-functional genomic regions in our non-model organism and I'm designing guides found +/- 1kb that flank large genomic regions. Unfortunately, the ontarget method fails if given more than 500 intervals if using 2016.

spacer <-find_spacers(target,bsgenome,complement=FALSE,mismatches=0,subtract_targets=TRUE,plot=FALSE)
Error in `[.data.table`(scoredt, , `:=`((ontargetmethod), scores)) : 
  Supplied 64279 items to be assigned to 94276 items of column 'Doench2016'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
In addition: Warning message:
In parallel::mclapply(contextchunks, function(x) { :
 scheduled cores 8, 6, 1 encountered errors in user code, all values of the jobs will be affected

The interesting thing is that the same command does work if using the default 2014. Is this a problem with configuring miniconda, maybe? Any help would be greatly appreciated!

bhagwataditya commented 2 years ago

Hi Krishna,

nice to hear from you! If you can give me a reproducible example, I will look into it and fix.

Cheers,

Aditya

biowumin commented 2 years ago

I have the same problem.

biowumin commented 2 years ago

It seems that the error occurs when too many genomic regions are processed. My solution is that design sgRNA for 50 genomic regions each time and combine all the spacers together.

bhagwataditya commented 2 years ago

Hey @biowumin , nice to hear from you. We would like to look into this problem. Would it be possible for you to generate a reproducible example? Then we can look into it. Same request to @templardrake : would you be able to generate a reproducible example ? Would like to fix this.

biowumin commented 2 years ago

library("multicrispr") library(reticulate) require(magrittr) reticulate::use_condaenv('azienv', required=TRUE) bsgenome <- BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38 targets <- char_to_granges(c(srf1 = 'chr14:106883431-106883731:+'), bsgenome) spacers <- find_spacers(targets, bsgenome, subtract_targets = TRUE,ontargetmethod="Doench2016",mismatches = 0,plot=F)

Error in py_call_impl(callable, dots$args, dots$keywords) : ValueError: 'N' is not in list

bhagwataditya commented 2 years ago

Thank you @biowumin . @rwiegan identified that the source of these issues is the part of the reference genome that has only NNNN values. @rwiegan pushed a fix that fixes this issue in in his test case. Does this help you?

rwiegan commented 2 years ago

The error was caused due to N's in the spacer sequences. With the latest merge spacer sequences containing N's are filtered out preventing this error