Closed herrroaa closed 5 years ago
See #45 Edit: Copy paste the answer from @mroosmalen here as well
This is because of the encoding language. If you use bash as your shell, you can put these lines in your ~/.bashrc and ~/.profile files export LC_CTYPE=en_US.UTF-8 export LANG=en_US.UTF-8
Thanks for the reply I added the two lines in my bashfiles, but did not work
Did you source your .bashrc or restart your terminal?
I restarted my terminal
Any other suggestions?
Maybe you can try to make the following bash file (eg. nanosv.sh)
#!/usr/bin/bash
export LC_CTYPE=en_US.UTF-8
export LANG=en_US.UTF-8
NanoSV BC01_sorted.bam -o BC01.vcf
And try to execute this bash file like ./nanosv.sh
I tried it and unfortunately I got the same error message !
Can you check your environment variables to be sure:
echo $LANG
echo $LC_ALL
Both should give en_US.UTF-8
as output.
You can also try this export LC_ALL=en_US.UTF-8
.
And you also check the default encoding language in your python:
import sys
sys.getdefaultencoding()
Hi, Thanks for the reply export LC_ALL=en_US.UTF-8 worked, but I have one more question now. I need to generate a bed file for hg38 with chrx notation. I got simple_repeats_file is from http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/simpleRepeat.txt.gz and gaps_file is from http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/gap.txt.gz
I then unziped the files and renamed then gap.bed and simpleRepeat.bed I modified the length of the chromosomes
genome = { 1: 248956422, 2: 242193529, 3: 198295559, 4: 190214555, 5: 181538259, 6: 170805979, 7: 159345973, 8: 145138636, 9: 138394717, 10: 133797422, 11: 135086622, 12: 133275309, 13: 114364328, 14: 107043718, 15: 101991189, 16: 90338345, 17: 83257441, 18: 80373285, 19: 58617616, 20: 64444167, 21: 46709983, 22: 50818468, 23: 156040895, 24: 57227415 }
simple_repeats_file = '/Users/tarekmagdyshehatamohamed/Downloads/gap.bed' gaps_file = '/Users/tarekmagdyshehatamohamed/Downloads/simpleRepeat.bed'
I then ran the .py file and I got this error message
$ cd /Users/tarekmagdyshehatamohamed/miniconda3/pkgs/nanosv-1.2.0-py36_1/lib/python3.6/site-packages/nanosv/bedfiles ;env "PYTHONIOENCODING=UTF-8" "PYTHONUNBUFFERED=1" /Users/tarekmagdyshehatamohamed/miniconda3/envs/pythontwopointseven/bin/python /Users/tarekmagdyshehatamohamed/.vscode/extensions/ms-python.python-2018.7.1/pythonFiles/PythonTools/visualstudio_py_launcher.py /Users/tarekmagdyshehatamohamed/miniconda3/pkgs/nanosv-1.2.0-py36_1/lib/python3.6/site-packages/nanosv/bedfiles 58853 34806ad9-833a-4524-8cd6-18ca4aa74f14 RedirectOutput,RedirectOutput /Users/tarekmagdyshehatamohamed/miniconda3/pkgs/nanosv-1.2.0-py36_1/lib/python3.6/site-packages/nanosv/bedfiles/create_human_hg38_bed.py
Traceback (most recent call last):
File "/Users/tarekmagdyshehatamohamed/miniconda3/pkgs/nanosv-1.2.0-py36_1/lib/python3.6/site-packages/nanosv/bedfiles/create_human_hg38_bed.py", line 63, in <module>
read_bed(simple_repeats_file)
File "/Users/tarekmagdyshehatamohamed/miniconda3/pkgs/nanosv-1.2.0-py36_1/lib/python3.6/site-packages/nanosv/bedfiles/create_human_hg38_bed.py", line 46, in read_bed
ch, start, end = line.split("\t")
ValueError: too many values to unpack
This is because the download file are not a proper bed file. First you need to convert the .txt file to a .bed file.
cut -f 2,3,4 gap.txt > gap.bed
cut -f 2,3,4 simpleRepeat.txt > simpleRepeat.bed
Maybe you should also remove the chr
notation in front of the chromosome names, depends on the bam file, if this has also the chr
notation or not.
These are my bed files
$ cat gap.bed | head -n 10
chr1 0 10000
chr1 207666 257666
chr1 297968 347968
chr1 535988 585988
chr1 2702781 2746290
chr1 12954384 13004384
chr1 16799163 16849163
chr1 29552233 29553835
chr1 121976459 122026459
chr1 122224535 122224635
$ cat simpleRepeat.bed | head -n 5
chr1 10000 10468
chr1 10627 10800
chr1 10757 10997
chr1 11225 11447
chr1 11271 11448
When I run the .py file I got this error
$ cd /Users/tarekmagdyshehatamohamed/miniconda3/pkgs/nanosv-1.2.0-py36_1/lib/python3.6/site-packages/nanosv/bedfiles ;env "PYTHONIOENCODING=UTF-8" "PYTHONUNBUFFERED=1" /Users/tarekmagdyshehatamohamed/miniconda3/envs/pythontwopointseven/bin/python /Users/tarekmagdyshehatamohamed/.vscode/extensions/ms-python.python-2018.7.1/pythonFiles/PythonTools/visualstudio_py_launcher.py /Users/tarekmagdyshehatamohamed/miniconda3/pkgs/nanosv-1.2.0-py36_1/lib/python3.6/site-packages/nanosv/bedfiles 50921 34806ad9-833a-4524-8cd6-18ca4aa74f14 RedirectOutput,RedirectOutput /Users/tarekmagdyshehatamohamed/miniconda3/pkgs/nanosv-1.2.0-py36_1/lib/python3.6/site-packages/nanosv/bedfiles/create_human_hg38_bed.py Traceback (most recent call last): File "/Users/tarekmagdyshehatamohamed/miniconda3/pkgs/nanosv-1.2.0-py36_1/lib/python3.6/site-packages/nanosv/bedfiles/create_human_hg38_bed.py", line 79, in <module> for mask_start in sorted(mask_regions[randchr]): KeyError: 5
It can't find chromsome 5
in the bedfile, because in the bed it has this chr
notation.
Remove the chr
notation in the bed file or add the chr
notation to the chromsomes in the genome dictionary.
There was a minor bug in the script. This should be fixed by now, in the newest version of NanoSV (v.1.2.1)
I used the new scrip create_random_position_bed.py
, but I encountered some issues.
1- I had to change genome.iteritems()
to genome.items()
to be compatible with python3.6.3 that I am using.
2-I had to delete randchr = randchr.replace('chr','')
, because it gave an error
AttributeError: 'int' object has no attribute 'replace'
I am not sure why this line is important?
3- There is a missing 'd'
in ranchr
at line 90 and line 91
4- I increased pick_random = 100000 to 1000000
5- using this script with gap and simplerepeats bed files with chr
notation and genome dictionary with chromosome numbers without chr
notation works fine. Eventually, I had to add chr
notaion to my generated bed file becaus emy bam file has the chr
notation. I used sed 's/^/chr/' test.bed > withchr.test.bed
What do you think about this?
Sorry but there were still some bugs in the script. Can you try it again with the newest version (1.2.2)
I used the new script with bed files with chr
notation and genome dictionary as follows
genome = { 1: 248956422, 2: 242193529, 3: 198295559, 4: 190214555, 5: 181538259, 6: 170805979, 7: 159345973, 8: 145138636, 9: 138394717, 10: 133797422, 11: 135086622, 12: 133275309, 13: 114364328, 14: 107043718, 15: 101991189, 16: 90338345, 17: 83257441, 18: 80373285, 19: 58617616, 20: 64444167, 21: 46709983, 22: 50818468, 23: 156040895, 24: 57227415 }
Then I added chr
notation to my hg38.bed file as my bam file has chr
notation as well
sed 's/^/chr/' hg38.bed > hg38.withchr.bed
cat hg38.withchr.bed | head -n 5
chr19 34671356 34671357
chr5 19440376 19440377
chr7 30238054 30238055
chr2 181665531 181665532
chr7 57515231 57515232
cat BC01_minimap.sam | head -n 5
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
I then tried to use nanosv but I got an error
$ NanoSV BC01_sorted.bam -s /Users/tarekmagdyshehatamohamed/miniconda3/pkgs/samtools-1.8-3/bin/samtools -b hg38.withchr.bed -o BC01.noanosv.vcf
Fri Aug 10 19:28:37 2018 Busy with calculating the coverage distribution...
dyld: Library not loaded: @rpath/libdeflate.so
Referenced from: /Users/tarekmagdyshehatamohamed/miniconda3/pkgs/samtools-1.8-3/bin/samtools
Reason: image not found
Can't calculate coverage distribution. The bed file may be inappropriate for your bam file.
my bam file has chrX
and chrY
, while the hg38.withchr.bed has chr23
and chr24
.
So, I thought this might causing the problem. I substituted chr23
with chrX
and chr24
withchrY
. I then ran the command one more time and it did not work
Any thoughts ?
And if you try this on a subset of your bed file:
$SAMTOOLS depth $BAM -b $BEDFILE | awk '{print $3}'
Do you get any result?
yes I got some results
$ samtools depth BC01_sorted.bam -b hg38_withchr02_xyy.bed | awk '{print $3}' | head -n 5
1
1
1
1
1
Did you use the same samtools path (I see samtools
and /Users/tarekmagdyshehatamohamed/miniconda3/pkgs/samtools-1.8-3/bin/samtools
? NanoSV execute the previous command to calculate the distribution and if it returns an empty list than you will get this error. But is seems that it will returns a list, so it should work.
I think the problem is samtools and the libdeflate.so library.
Hi, I am trying to call snps from bam fil generated by minimap2, but I got an error
NanoSV BC01_sorted.bam -o BC01.vcf
I got this error message
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 948: ordinal not in range(128)
Here is the complete output