Open edanchin opened 2 years ago
Hi, I have tested replacing the first two columns with NA, and it worked. Please provide some of your allele table.
Hi and many thanks for your message,
I double checked my files and there was a problem in my Allele.ctg.table file which contained additional spaces instead of tabs.
I have removed the additional spaces and made sure everything was tab separated and now it works fine.
I am using AllHiC version 0.9.13 and the following command line:
ALLHiC_prune -i groups_100.txt -b sample.clean.bam -r Minc_v4_shac_genome.fasta
groups_100.txt is the tab-delimited file with first two columns being NA and the rest of the column indicating which contig is a copy of which other contig.
I checked the logs and removedb_Allele.txt and indeed it seems all the reads that corresponded to contacts between allelic contigs have been removed.
Now I'm running the next step (parition).
Just one additional question if I may.
Can I provide two restriction sites in the -e option ?
Indeed I use the Arima 2-enzymes kit which cuts both at GATC and GANTC.
Can use -e GATC,GANTC or only one enzyme is allowed?
Many thanks for your help
Etienne
--
Etienne G.J. Danchin
http://edanchin.org
http://www.paca.inra.fr/institut-sophia-agrobiotech Tel. +33 492 386 402 Fax. +33 492 386 587
De : Yibin Wang @.***> Envoyé : mardi 2 août 2022 03:24 À : tangerzhang/ALLHiC Cc : Etienne Danchin; Author Objet : Re: [tangerzhang/ALLHiC] Allele.ctg.table does not eliminate any read in prune step (Issue #138)
Hi, I have tested replacing the first two columns with NA, and it worked. Please provide some of your allele table.
- Reply to this email directly, view it on GitHubhttps://github.com/tangerzhang/ALLHiC/issues/138#issuecomment-1201909993, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2KDM67INUYGUYM7M6C5CB3VXB2EXANCNFSM55HGP6DA. You are receiving this because you authored the thread.Message ID: @.***>
Hi,
Can I provide two restriction sites in the -e option ?
You can use -e Arima
in the latest version of ALLHiC_partition.
$ ALLHiC_partition
Usage: ALLHiC_partition -r draft.asm.fasta -e enzyme_sites -k Num of groups
-h : help and usage.
-b : prunned bam (optional, default prunning.bam)
-r : draft.sam.fasta
-e : enzyme_sites (HindIII: AAGCTT; MboI: GATC, Arima)
-k : number of groups (user defined K value)
-m : minimum number of restriction sites (default, 25)
Many thanks,
I will re-run everything with the last version !
One last question if I may: the species I am studying is triploid with 3n= 45 -47 chromosomes. The closest non-polyploid genome has n=16 chromosomes. Hence, I am wondering how many groups I should select for the partition step... 47, 16, 3 ?
Any advice about this parameter?
Thanks again for your help
Etienne
--
Etienne G.J. Danchin
http://edanchin.org
http://www.paca.inra.fr/institut-sophia-agrobiotech Tel. +33 492 386 402 Fax. +33 492 386 587
De : Yibin Wang @.***> Envoyé : mardi 2 août 2022 11:10 À : tangerzhang/ALLHiC Cc : Etienne Danchin; Author Objet : Re: [tangerzhang/ALLHiC] Allele.ctg.table does not eliminate any read in prune step (Issue #138)
Hi,
Can I provide two restriction sites in the -e option ?
You can use -e Arima in the latest version of ALLHiC_partition.
$ ALLHiC_partition Usage: ALLHiC_partition -r draft.asm.fasta -e enzyme_sites -k Num of groups -h : help and usage. -b : prunned bam (optional, default prunning.bam) -r : draft.sam.fasta -e : enzyme_sites (HindIII: AAGCTT; MboI: GATC, Arima) -k : number of groups (user defined K value) -m : minimum number of restriction sites (default, 25)
- Reply to this email directly, view it on GitHubhttps://github.com/tangerzhang/ALLHiC/issues/138#issuecomment-1202224736, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2KDM65EIILS7CGFGAATS2LVXDQZXANCNFSM55HGP6DA. You are receiving this because you authored the thread.Message ID: @.***>
You can try setting 47 groups or more. From our experience, the final grouping of chromosomes can be determined using Hi-C heatmaps. Too small groups can be discarded manually. Or, split chromosomes also can be merged manually.
Many thanks, I will try with 50 groups and then manually edit the assembly using the contact map if I notice some problems in the scaffolding.
All the best
Etienne
--
Etienne G.J. Danchin
http://edanchin.org
http://www.paca.inra.fr/institut-sophia-agrobiotech Tel. +33 492 386 402 Fax. +33 492 386 587
De : Yibin Wang @.***> Envoyé : mercredi 3 août 2022 04:40 À : tangerzhang/ALLHiC Cc : Etienne Danchin; Author Objet : Re: [tangerzhang/ALLHiC] Allele.ctg.table does not eliminate any read in prune step (Issue #138)
You can try setting 47 groups or more. From our experience, the final grouping of chromosomes can be determined using Hi-C heatmaps. Too small groups can be discarded manually. Or, split chromosomes also can be merged manually.
- Reply to this email directly, view it on GitHubhttps://github.com/tangerzhang/ALLHiC/issues/138#issuecomment-1203418339, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2KDM62EIIHNWBN4Z76FA4LVXHL25ANCNFSM55HGP6DA. You are receiving this because you authored the thread.Message ID: @.***>
Dear developers of AllHiC
Thanks a lot for having developed this software to solve the problem of scaffolding polyploid genomes resulting in false-positive contact information between closely related copies.
For the polyploid species I'm working with, there is no well assembled monoploid genome available for a related species. Therefore, I used McScanX on my annotated contigs to identify which contig is the copy of which other contig. I then used this information to generate an Allele.ctg.table with chromosome and position fields empty and replaced by NA NA as suggested in issue #9 . So at the end I have a tab-separated file like: NA NA contigx contigy contigz NA NA contigw contign etc. However, when I provide this file to AllHiC for the prune step, although the table seems to be read correctly according to the log.txt file, no read is eliminated and the file removedb_Allele.txt remains empty. The produced prunning.sam file still contains all the read pairs I want to be eliminated.
Any clue to solve this issue? Many thanks in advance