Closed MichelMoser closed 4 years ago
Hi Michel,
Sorry for the late response.
This is because the script reads gene names from Name=xxx
rather than ID=xxx
.
I noticed that the gene names in target cds are exactly the same with gene names from ID=xxx
.
Therefore, you may need to modify the codes in Line 53 and 65 as below:
my $tgene = $1 if(/ID=(\S+)/);
my $rgene = $1 if(/ID=(\S+)/);
sorry, my bad. I assumed Name == ID
, which is False
of course =)
I formatted the gff and changed "Name=" accordingly and it works now.
The output file Allele.ctg.table shows a lot of lines where the contig names are the same. Does that mean that ALLHIC will remove Hi-C contacts within contigs as well? I hope not! Or should i parse such lines out?
ssa28 36674475 Flye15k_contig_313 Flye15k_contig_313
sssa28 36674475 Flye15k_contig_313 Flye15k_contig_313
ssa28 36819276 Flye15k_contig_850 Ssal_flye30K_contig_01029
ssa28 36879032 Ssal_flye30K_contig_00195 Ssal_flye30K_contig_00195
Thanks, Michel
ALLHiC will not remove Hi-C contacts within contigs. Therefore, no worry about the lines that share the same contigs.
Hi,
I try to create allele.ctg.table files but run into trouble running the classify step.
classify.pl returns following errors:
I thought its formatting problem but cant spot anything obvious in my input files.
I formated cds files according to tutorials with identical name in gff: target gff3:
target cds
ref gff3
ref cds
after blastn_parse.pl Eblast.out
Is there a problem using underscores as gene names? thanks, Michel