Closed wyim-pgl closed 4 years ago
OK, OK. We will need Haibao to revise his 'allhic' program with GO language and I will revise my PERL script accordingly. Hi @tanghaibao , although Won is likely the only one who are using multiple restriction enzymes in Hi-C scaffolding, we have no reason to deny the request from our beloved friend. Do you have time to update allhic?
@tangerzhang Xingtan allhic extract might need an update to do this. Arima uses GATCGATC, GANTGATC, GANTANTC, GATCANTC, which will have the combination below. @tanghaibao Arima Hi-C is ready and I will use this for B. carinata genome assembly. Best, Won
GATCGATC,GAATGATC,GATTGATC,GACTGATC,GAGTGATC,GAATAATC,GAATATTC,GAATACTC,GAATAGTC,GTATAATC,GTATATTC,GTATACTC,GTATAGTC,GCATAATC,GCATATTC,GCATACTC,GCATAGTC,GGATAATC,GGATATTC,GGATACTC,GGATAGTC,GATCAATC,GATCATTC,GATCACTC,GATCAGTC
@tangerzhang @wyim-pgl
I made the changes here and made new releases. You can now run:
allhic extract --RE="GATCGATC,GANTGATC,GANTANTC,GATCANTC" test.bam test.fasta
Thank you, dear!!. I will run it right now. Won
Hi
I have data generated using the Arima HiC+ kit, which has two restriction enzymes: Mbo I (^GATC) and Hinf I (G^ANTC). The ligation motifs are thus: GATCGATC, GANTGATC, GANTANTC, GATCANTC
I have read this thread, where it is described how I specify multiple enzymes to 'allhic extract' (i.e. --RE="GATCGATC, GANTGATC, GANTANTC, GATCANTC"). But it is not described in the manual how it is specified for the first step in the pipeline (diploid genome), 'ALLHiC_partition'. Here the argument is described as follows:
-e: enzyme_sites (HindIII: AAGCTT; MboI: GATC)
Can someone clarify this?
Best, Sjannie
Hi
I have data generated using the Arima HiC+ kit, which has two restriction enzymes: Mbo I (^GATC) and Hinf I (G^ANTC). The ligation motifs are thus: GATCGATC, GANTGATC, GANTANTC, GATCANTC
I have read this thread, where it is described how I specify multiple enzymes to 'allhic extract' (i.e. --RE="GATCGATC, GANTGATC, GANTANTC, GATCANTC"). But it is not described in the manual how it is specified for the first step in the pipeline (diploid genome), 'ALLHiC_partition'. Here the argument is described as follows:
-e: enzyme_sites (HindIII: AAGCTT; MboI: GATC)
Can someone clarify this?
Best, Sjannie
Hi Siannie, We do not have Arima Hi-C data to test it, but theoretically the original command should be OK, e.g.
ALLHiC_partition -r draft.asm.fasta -b sample.clean.bam -k 4 -e GATCGATC,GANTGATC,GANTANTC,GATCANTC
Please note that there should be no space between enzyme sites.
Thank you very much for clarifying!
I noticed that when running the wrapper, ALLHiC_partition
, the output from allhic extract
becomes GATC_GANTC.txt
, while the next command in the pipeline, allhic partition
, expects a file named GATC,GANTC.txt
(but I guess commas do not work well in file names and that is why it is changed). So I ran the commands manually (which is fine).
I am now rerunning as per your suggestion (except not running the wrapper, but the extract and partition separately), as I can see I did not specify the sites correctly (I just used GATC,GANTC), and I can see that this obviously caused an inflated number of RE sites to be found, around 1 per 187 bp vs. 1 per 3800bp.
UPDATE: Worked well, got scaffolds of expected chromosome sizes.
I ran with Amira-kit and it worked. I think I merged them all together (merged restriction location bed).
Dear Xingtan, Do you have any plan to update multiple restriction enzyme functions? I love you. Won