staciawyman / blender

This BLENDER has been sunsetted
https://github.com/cornlab/blender
16 stars 6 forks source link

Reference genome issues #3

Open khaledzuj opened 3 years ago

khaledzuj commented 3 years ago

HI

I am using the blender on a WGS for two BAM files, I am getting this error despite all the reference genome files that I have used

$ sudo sh run_blender.sh hg19.fa.align /datadrive/info.json/EGAF00004455619/sample.sorted.bam /datadrive/info.json/EGAF00004455622/sample.sorted.bam CAACAGTCTTACCTGGACTC DEMO/blender

[main_samview] region "chr1" specifies an unknown reference name. Continue anyway. [main_samview] region "chr2" specifies an unknown reference name. Continue anyway. [main_samview] region "chr3" specifies an unknown reference name. Continue anyway. [main_samview] region "chr4" specifies an unknown reference name. Continue anyway. [main_samview] region "chr5" specifies an unknown reference name. Continue anyway. [main_samview] region "chr6" specifies an unknown reference name. Continue anyway. [main_samview] region "chr7" specifies an unknown reference name. Continue anyway. [main_samview] region "chr8" specifies an unknown reference name. Continue anyway. [main_samview] region "chr9" specifies an unknown reference name. Continue anyway. [main_samview] region "chr10" specifies an unknown reference name. Continue anyway. [main_samview] region "chr11" specifies an unknown reference name. Continue anyway. [main_samview] region "chr12" specifies an unknown reference name. Continue anyway. [main_samview] region "chr13" specifies an unknown reference name. Continue anyway. [main_samview] region "chr14" specifies an unknown reference name. Continue anyway. [main_samview] region "chr15" specifies an unknown reference name. Continue anyway. [main_samview] region "chr16" specifies an unknown reference name. Continue anyway. [main_samview] region "chr17" specifies an unknown reference name. Continue anyway. [main_samview] region "chr18" specifies an unknown reference name. Continue anyway. [main_samview] region "chr19" specifies an unknown reference name. Continue anyway. [main_samview] region "chr20" specifies an unknown reference name. Continue anyway. [main_samview] region "chr21" specifies an unknown reference name. Continue anyway. [main_samview] region "chr22" specifies an unknown reference name. Continue anyway. [main_samview] region "chrX" specifies an unknown reference name. Continue anyway. [main_samview] region "chrY" specifies an unknown reference name. Continue anyway. run_blender.sh: 37: run_blender.sh: GUIDE+=NGG: not found

thank you in advance

staciawyman commented 3 years ago

It looks like the reference genome that you are using does not call the chromosomes chr1,chr2 etc and they are probably called just 1,2 etc. You can either change the names of the chromosomes in the reference to have the chr prefix, or if you are comfortable with it could change the blender code (and remove the "chr" prefix there). Difference sources for the reference genome have or don't have the "chr." I believe UCSC has the chr.

khaledzuj commented 3 years ago

I am sorry, but I am really lost here, the issue is that I have no clue about the right reference genome file or format that I should use for this tool.

I have tried a hg19.fasta reference genome and also tried the bed format that comes with the tool I also tried head hg19.trf.bed file.

but always getting the same error.

I would appreciate it if you would give me an example of the reference genome

staciawyman commented 3 years ago

You can download hg38.fa.gz here: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/

khaledzuj commented 3 years ago

sudo sh run_blender.sh hg38.fa /datadrive/info.json/EGAF00004455619/sample.sorted.bam /datadrive/info.json/EGAF00004455622/sample.sorted.bam CAACAGTCTTACCTGGACTC DEMO/blender

[main_samview] region "chr1" specifies an unknown reference name. Continue anyway. [main_samview] region "chr2" specifies an unknown reference name. Continue anyway. [main_samview] region "chr3" specifies an unknown reference name. Continue anyway. [main_samview] region "chr4" specifies an unknown reference name. Continue anyway. [main_samview] region "chr5" specifies an unknown reference name. Continue anyway. [main_samview] region "chr6" specifies an unknown reference name. Continue anyway. [main_samview] region "chr7" specifies an unknown reference name. Continue anyway. [main_samview] region "chr8" specifies an unknown reference name. Continue anyway. [main_samview] region "chr9" specifies an unknown reference name. Continue anyway. [main_samview] region "chr10" specifies an unknown reference name. Continue anyway. [main_samview] region "chr11" specifies an unknown reference name. Continue anyway. [main_samview] region "chr12" specifies an unknown reference name. Continue anyway. [main_samview] region "chr13" specifies an unknown reference name. Continue anyway. [main_samview] region "chr14" specifies an unknown reference name. Continue anyway. [main_samview] region "chr15" specifies an unknown reference name. Continue anyway. [main_samview] region "chr16" specifies an unknown reference name. Continue anyway. [main_samview] region "chr17" specifies an unknown reference name. Continue anyway. [main_samview] region "chr18" specifies an unknown reference name. Continue anyway. [main_samview] region "chr19" specifies an unknown reference name. Continue anyway. [main_samview] region "chr20" specifies an unknown reference name. Continue anyway. [main_samview] region "chr21" specifies an unknown reference name. Continue anyway. [main_samview] region "chr22" specifies an unknown reference name. Continue anyway. [main_samview] region "chrX" specifies an unknown reference name. Continue anyway. [main_samview] region "chrY" specifies an unknown reference name. Continue anyway. run_blender.sh: 37: run_blender.sh: GUIDE+=NGG: not found

I still get the same error

khaledzuj commented 3 years ago

I think I got the reason for the problem now the Chr is named as chr 1 .... in the reference and numbered in the bam file.

A00260:172:HF7YWDSXY:1:1371:2193:30154 177 1 9996 0 99S52M 22 50808389 0 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCAATTTATATATCCCGCCCCCCCACCCAAACACACACCCAACGCTACACTCTTTCCCTACCCGACGCTCTTCCGATATCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC FFFFFFFFFFFF:FFFFFFFFF:FFF,:,,,,:,,,,,,,,:,:,,,FF:,,,::,,,:,,,,:FF,,,F::FF,,:F,FF:FFFFFF,,F:,:,F,FFF,FF:F,FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF,FFFFFFF:FF NM:i:1 MD:Z:7A44 MC:Z:104S47M AS:i:47 XS:i:44 RG:Z:STE130PE2_HF7YWDSXY_1 SA:Z:2,32916443,+,121S30M,0,0;

what shall I do in this case ??

khaledzuj commented 3 years ago

I have done what you have told me at the start and I think it is working now

Thank you

khaledzuj commented 3 years ago

You can download hg38.fa.gz here: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/

Hi

I have modified the blender code and removed the chr from one line, and start getting these lines for all of the reads after running the command.

I am not sure if the tool works fine or there is an issue still there.

[W::fai_fetch] Reference 1:3069560-3069561 not found in FASTA file, returning empty sequence Failed to fetch sequence in 1:3069560-3069561 [W::fai_fetch] Reference 1:3069621-3069622 not found in FASTA file, returning empty sequence Failed to fetch sequence in 1:3069621-3069622 [W::fai_fetch] Reference 1:3069631-3069632 not found in FASTA file, returning empty sequence Failed to fetch sequence in 1:3069631-3069632 [W::fai_fetch] Reference 1:3069622-3069623 not found in FASTA file, returning empty sequence Failed to fetch sequence in 1:3069622-3069623 [W::fai_fetch] Reference 1:3069632-3069633 not found in FASTA file, returning empty sequence Failed to fetch sequence in 1:3069632-3069633 [W::fai_fetch] Reference 1:3069628-3069629 not found in FASTA file, returning empty sequence