pritykinlab / guidescanpy

1 stars 0 forks source link

Data checking #63

Closed 1rzhu closed 1 year ago

1rzhu commented 1 year ago

Ensured the PAM of cpf1 matches the correct position in the target sequences. Will do more checks on the data.

1rzhu commented 1 year ago

There seems to be a big difference between the old and new bam files. Here is the result of an expriment on a 1000-size sample of a new bam file:

I'm currently trying to identify the factors that may have caused the differences.

vineetbansal commented 1 year ago

Keep in mind that for comparison, the entire database needs to be generated (without --max-kmers) because the offtargets/specificity vary depending on how much of the genome is seen (not the cutting-efficiency though). Once generated, you can then sort it (by position using samtools or pysam) and do the comparison for the first n matches, though.