sanger-pathogens / SnpEffWrapper

Takes a VCF and applies annotations from a GFF using SnpEff
Other
5 stars 5 forks source link

Support GFF with sequences in separate FASTA file #13

Open peterjc opened 5 years ago

peterjc commented 5 years ago

The README says "The GFF must contain the reference sequence in Fasta format"

This seems to explain why our first attempt to use SnpEffWrapper failed (snpEff build could not find the FASTA files in the temporary directory). It would be nice to optionally allow passing a FASTA file for the assembly separately from the GFF file.

peterjc commented 5 years ago

Should I file a separate issue on the unclear failure if used with a GFF file without embedded FASTA sequences? Looking at the code it seems to try to give a warning, https://github.com/sanger-pathogens/SnpEffWrapper/blob/v0.2.5/snpEffWrapper/wrapper.py#L297 - but wouldn't it be better to actually abort without calling snpEff build?

GBeattie commented 5 years ago

hey peterjc,

Might I ask how you merged your fasta and GFF in the end? I used cat .gff .fasta > .fasta.gff and I am getting the following error (which may be unrelated to how I've combined the files, but just want to check!)

[2018-10-19 15:45:10,601] INFO: Checking that the VCF and GFF contigs are consistent [2018-10-19 15:45:11,724] INFO: Building snpeff database Traceback (most recent call last): File "/home/manager/miniconda3/envs/ddocent_env/lib/python3.6/site-packages/snpEffWrapper/wrapper.py", line 222, in _snpeff_build_database subprocess.check_call(command, stdout=stdout, stderr=stderr) File "/home/manager/miniconda3/envs/ddocent_env/lib/python3.6/subprocess.py", line 291, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/usr/bin/java', '-Xmx4g', '-jar', '/media/sf_SharedDrive/Download/snpEff/snpEff.jar', 'build', '-gff3', '-verbose', 'data', '-c', '/home/manager/WGS/WGS analysis/snpeff_data_dir_hrk9sc02/config']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/manager/miniconda3/envs/ddocent_env/bin/snpEffBuildAndRun", line 45, in annotate_vcf(args) File "/home/manager/miniconda3/envs/ddocent_env/lib/python3.6/site-packages/snpEffWrapper/wrapper.py", line 343, in annotate_vcf args.vcf_file, config_filename, args.debug) File "/home/manager/miniconda3/envs/ddocent_env/lib/python3.6/site-packages/snpEffWrapper/wrapper.py", line 275, in run_snpeff build_stderr) File "/home/manager/miniconda3/envs/ddocent_env/lib/python3.6/site-packages/snpEffWrapper/wrapper.py", line 224, in _snpeff_build_database raise BuildDatabaseError("Problem building the database from your GFF") snpEffWrapper.wrapper.BuildDatabaseError: Problem building the database from your GFF

peterjc commented 5 years ago

What you are likely missing is the special line ##FASTA\n before the FASTA file starts with ">"...

Luckily I had documented this locally, I did it once by hand and then came up with the following as a reproducible alternative:

bash -c "cat annotation_only.gff; echo '##FASTA' ; cat reference.fasta" > annotation_with_fasta.gff

You could make a dummy file with the magic line, and then concatenate the three files (in order) to make your combined files, but I used the echo command here instead.

GBeattie commented 5 years ago

Hey peter!

Thanks for that, unfortunately the same error message appears after using your cat command to put in that extra line, so I'm back at square one I feel!

Thanks again,

Gordon

peterjc commented 5 years ago

I suspect there is something else "wrong" with your GFF file then - I would suggest opening a new issue, and offering to share the files directly with the tool authors (or if you can, posting them online, e.g. via https://gist.github.com).

GBeattie commented 5 years ago

Yes I will open another issue as now it seems it's unrelated to how the fasta and GFF are merged.

Thanks,

G