Closed yonniejon closed 7 months ago
So the problem was no the chr prefix. I replaced my bed file to not contain the "chr" prefixes and I removed the "chr" prefixes in my fasta reference file and the problem persists.
This means that you have a non utf-8 character at the beginning of your file. Did you by chance export this from MS Excel as utf-16? If so then you need to convert your file to utf-8 encoding. You can also export from Excel in utf-8 encoding as well.
I did not. I ran nano tmp.bed and pasted the following contents exactly:
chr6 132891948 132892108 chr10 127585142 127585221
Just to confirm - you have said:
where tmp.bed.gz looks like: chr6 132891948 132892108 chr10 127585142 127585221
Do you mean that the tmp.bed file contains this, and you have also gzipped it? If so I think I understand the issue. The --bed option does not handle gzipped input. If you want to pass a gzipped file you could do:
$ faidx hg19/genome.fa.gz -b - <( gzip -dc tmp.bed.gz)
The above would use a sub shell to decompress your bed file and send it to stdin, which can be read by the --bed argument using the "-" symbol. You could alternatively pass an uncompressed bed file.
"Do you mean that the tmp.bed file contains this, and you have also gzipped it?"
Yes you are correct. But I only gzipped it because when I ran it without gzip/bgzip I got the following error:
faidx genome.fa.gz -b tmp.bed
Traceback (most recent call last):
File "/cs/usr/jjj/.local/bin/faidx", line 8, in
Ah I see. That error message is telling you that the FASTA file cannot be gzip compressed. You can however use block-gzip compression to compress the FASTA file. See https://www.htslib.org/doc/bgzip.html
Got it! Thanks. Sorry about the confusion!
No worries - glad to help!
Hi!
I am running faidx version 0.7.2.1
I am running it with a bed file input like so:
faidx hg19/genome.fa.gz -b tmp.bed.gz
where tmp.bed.gz looks like: chr6 132891948 132892108 chr10 127585142 127585221
I get the following error: Traceback (most recent call last): File "/cs/usr/jrosensk/.local/bin/faidx", line 8, in
sys.exit(main())
File "/cs/usr/jrosensk/.local/lib/python3.9/site-packages/pyfaidx/cli.py", line 202, in main
write_sequence(args)
File "/cs/usr/jrosensk/.local/lib/python3.9/site-packages/pyfaidx/cli.py", line 26, in write_sequence
for region in regions_to_fetch:
File "/usr/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
I assume the problem is that my bed file has the "chr" prefix? It is a problem because my genome file has the chr prefix as well. Is there way around this or I need to change the reference .fa file?