Closed zou-yawen closed 1 year ago
Hi @zou-yawen
To use a custom genome, please see this readme section.
Basically, first you need to create a database of duplications (parascopy pretable
and parascopy table
).
For that the reference genome needs to be indexed with samtools faidx
and bwa index
.
Then, you can run parascopy cn
as for the regular human genome. Note, if you plan to use Parascopy first a small number of samples (for example one), aggregate copy number will be reported, but paralog-specific copy number will be not.
You can modify this behavior with --min-samples 1
, but be careful, the results will not be as reliable as if you use many samples.
Thanks a lot.When I use samtools(1.15.1) faidx and bwa(0.7.17) index to create genome index ,then use "parascopy pretable -f genome.fa -o pretable.bed.gz".Something wrong with it.
The errors on the first screenshot are not important, it is more of a warning.
What would happen if you run samtools faidx genome.fa ChrMT > chr_mt.fa
?
If it works, can you send me the output?
It is possible that there is something wrong with the input fasta file (incomplete MT chromosome).
"samtools faidx genome.fa ChrMT > chr_mt.fa" does`t work,error is below: below is rigth:
So it is a problem with the reference fasta file (incomplete MT sequence), not with Parascopy. You can try to remove MT from the reference, or just create/download the reference anew. You can also check the .fai file, maybe there is a problem there. You can find format description here.
Thanks a lot.I download the reference from website again,it works.
Hi,do you have any advice about --bed-regions , what is the size of every window.I do not want to use hg19/hg38.
First of all, you can check out the regions here to get an idea. For both genomes we used a set of ~90k windows, with window length 100 bp, and all do not overlap the homology table.
The problem is that many genomic regions have some biases, so it is not so trivial to select good set of background windows. You can try to select many exons, extend them to the left and right, and filter regions so that they do not overlap the homology table:
bedtools slop -i input.bed -g genome.fa.fai -b 5000 | \
bedtools merge -i - -d 1000 > tmp/ext_regions.bed
bedtools makewindows -b tmp/ext_regions.bed -w 100 > tmp/windows.bed
zgrep -v '#' table.bed.gz | cut -f1-3 | \
bedtools slop -b 500 -i - -g genome.fa.fai | \
bedtools merge -i - -d 500 > tmp/table.bed
bedtools intersect -a tmp/windows.bed -b tmp/table.bed -v | \
awk '$3 - $2 == 100' > windows.bed
I did not run this code right now, but it should be more or less correct.
Hi, I want to use custom genome as reference instead of hg38;but I don`t know how to do this.Could you please help me.