vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.11k stars 195 forks source link

vg can't read own fasta index in presence of hla alts #2482

Open glennhickey opened 5 years ago

glennhickey commented 5 years ago

Here's a quick way to reproduce:

# download GRCH38 and make sure there's no index
rm -f GRCh38_full_analysis_set_plus_decoy_hla.fa*
wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

# construct the graph once: generates .fai on the fly and runs ok
vg construct -r GRCh38_full_analysis_set_plus_decoy_hla.fa -R chr21 > chr21.vg
>Restricting to chr21 from 1 to end
>index file GRCh38_full_analysis_set_plus_decoy_hla.fa.fai not found, generating...
ls -l chr21.vg
>-rw-rw-r-- 1 ubuntu ubuntu 25421409 Sep 25 12:41 chr21.vg

# construct the graph again: can't read the .fai and despite calling it a warning, abort right away
vg construct -r GRCh38_full_analysis_set_plus_decoy_hla.fa -R chr21 > chr21.vg
>Warning: malformed fasta index file GRCh38_full_analysis_set_plus_decoy_hla.fa.faidoes not have enough fields @ line 2842
>HLA-A*01:01:01:01       HLA00001        3503    3261539889      72      73
ls -l chr21.vg
-rw-rw-r-- 1 ubuntu ubuntu 0 Sep 25 12:42 chr21.vg
ekg commented 5 years ago

That's weird.

For the moment, please try to work around this by renaming or reheadering the FASTA sequences.

On Wed, Sep 25, 2019 at 2:46 PM Glenn Hickey notifications@github.com wrote:

Here's a quick way to reproduce:

download GRCH38 and make sure there's no index

rm -f GRCh38_full_analysis_set_plus_decoy_hla.fa* wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

construct the graph once: generates .fai on the fly and runs ok

vg construct -r GRCh38_full_analysis_set_plus_decoy_hla.fa -R chr21 > chr21.vg

Restricting to chr21 from 1 to end index file GRCh38_full_analysis_set_plus_decoy_hla.fa.fai not found, generating... ls -l chr21.vg -rw-rw-r-- 1 ubuntu ubuntu 25421409 Sep 25 12:41 chr21.vg

construct the graph again: can't read the .fai and despite calling it a warning, abort right away

vg construct -r GRCh38_full_analysis_set_plus_decoy_hla.fa -R chr21 > chr21.vg

Warning: malformed fasta index file GRCh38_full_analysis_set_plus_decoy_hla.fa.faidoes not have enough fields @ line 2842 HLA-A*01:01:01:01 HLA00001 3503 3261539889 72 73 ls -l chr21.vg -rw-rw-r-- 1 ubuntu ubuntu 0 Sep 25 12:42 chr21.vg

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2482?email_source=notifications&email_token=AABDQEI7CCZ2FCZ6JQ2M5ZTQLNMQXA5CNFSM4I2MJ7IKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HNS4SNA, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEO4XRRRF6WGRZMLWSTQLNMQXANCNFSM4I2MJ7IA .