rr1859 / R.4Cker

MIT License
16 stars 15 forks source link

reduced_genome.sh takes a lot #19

Closed iprada closed 7 years ago

iprada commented 7 years ago

Hi, I am trying to create a reduced genome with your script in order to have my analysis as similar as yours. However, the script reduced genome takes a lot (almost 6 hours) and the file:

${genome}_${enzyme}_flankingsequences${fl}_unique.fa

Is empty afterwards, and I can't continue with downstream analysis, is there another way to reduce the genome or do you have previously reduced genomes?

P.D: The step that takes a lot line 32 in the script when it does grep -xF -f - -B 1

best, Inigo

rr1859 commented 7 years ago

What genome are using? and can you send me the enzyme and size of the fragment?

iprada commented 7 years ago

Thank you for you for your fast answer.

I am working with the human hg19, and the enzyme I am using is dpnii (restriction site GATC). Then length of my reads is 60.

Thank you very much one more time

best,

rr1859 commented 7 years ago

What is the size of the read excluding the barcode (if any), bait primer and restriction enzyme site?

iprada commented 7 years ago

The size of the read is 60 (21 nucleotides primer+dpnii site and 39 the bait) the bait primer is GCAAGTGCCCTCATGTGATC were the last four letters (GATC) are the restriction enzyme site

rr1859 commented 7 years ago

Hi, Sorry just to double check again but the sequence you just sent is 20bp not 21.

iprada commented 7 years ago

Yeah, I am sorry, I was wrong, it is 20 nucleotides of the primer, not 21.

rr1859 commented 7 years ago

I created a reduced genome for 39 and 40 bp. you can download the files here - https://drive.google.com/open?id=0B0wnRGWP-yayNlh4UmVWcW1BTlU

iprada commented 7 years ago

That's awesome, thank you very much. However let me bother you with just a few questions. I actually have 3 viewpoints to analyze. Did you change the scripts to reduce the human genome?

In case you did change the script, could you please share it with me in order to be able to analyze the rest of the viewpoints?

In case you did not change the scripts, could you please let me know what computing resources are you using to create the file?

best,

rr1859 commented 7 years ago

Hi, I did not change the script. I run this script on our institutes computing facility using 1 node, 1 core and 40GB memory and the runtime is between 1-2 hours depending on the genome and enzyme.