iqbal-lab-org / gramtools

Genome inference from a population reference graph
MIT License
92 stars 15 forks source link

Handle no variants after clustering #153

Closed martinghunt closed 4 years ago

martinghunt commented 4 years ago

Current behaviour is this output if clustering results in build.vcf with no variants:

$ gramtools build --vcf $PWD/tmp.vcf_chunker.make_split_files/split.0.in.vcf --reference $PWD/tests/data/vcf_chunker/make_split_files.in.ref.fa --kmer-size 5 --max-read-length 150 --max-threads 1 --all-kmers --gram-dir OUT.build
2019-11-19 14:16:54,054 gramtools    INFO     Start process: build
2019-11-19 14:16:54,054 gramtools    INFO     Running vcf_record_clustering on ['/home/vagrant/minos/tmp.vcf_chunker.make_split_files/split.0.in.vcf'].
Traceback (most recent call last):
  File "/usr/local/bin/gramtools", line 11, in <module>
    load_entry_point('gramtools==1.5.0', 'console_scripts', 'gramtools')()
  File "/usr/local/lib/python3.6/dist-packages/gramtools/gramtools.py", line 71, in run
    command.run(args)
  File "/usr/local/lib/python3.6/dist-packages/gramtools/commands/build.py", line 134, in run
    command_hash_paths = common.hash_command_paths(command_paths)
  File "/usr/local/lib/python3.6/dist-packages/gramtools/common.py", line 67, in hash_command_paths
    if not os.path.isfile(path):
  File "/usr/lib/python3.6/genericpath.py", line 30, in isfile
    st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

build.vcf contains this:

##fileformat=VCFv4.2
##source=cluster_vcf_records
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample_name

Would be good to check for variants in build.vcf before trying to continue

martinghunt commented 4 years ago

Actually, looks like it's the clustering that is failing, resulting in a VCF with no records.

bricoletc commented 4 years ago

@martinghunt I think this is fixed through cluster_vcf_records module?

martinghunt commented 4 years ago

Yes I fixed cluster_vcf_records, so that example works now. But I'm wondering if it's still possible to end up with no records in the VCF. What will gramtools do in that case?

bricoletc commented 4 years ago

I see.

Here is the behaviour if you gramtools build using a VCF with no records:

$ gramtools build --reference toy_datasets/noRecords/H37Rv.fasta --vcf toy_datasets/noRecords/mutant.vcf --gram-dir test
2020-01-07 10:02:48,084 gramtools    INFO     Start process: build
2020-01-07 10:02:48,085 gramtools    INFO     Running vcf_record_clustering on ['toy_datasets/noRecords/mutant.vcf'].
2020-01-07 10:02:48,085 gramtools    INFO     Running vcf_to_PRG_string_conversion on /home/brice/Desktop/work_PhD/git_repos/gramtools/tmp_work/test/build.vcf
2020-01-07 10:02:48,097 gramtools    INFO     stdout:

maximum thread count: 1
Executing build command
Generating integer encoded PRG
Number of characters in integer encoded linear PRG: 2660
Generate coverage graph
Number of variant sites: 0
No variant sites found.
Exiting 1

2020-01-07 10:02:48,099 gramtools    ERROR    Error code != 0
2020-01-07 10:02:48,100 gramtools    ERROR    Unsuccessful build. Process reported to /home/brice/Desktop/work_PhD/git_repos/gramtools/tmp_work/test/build_report.json

I figure that's good behaviour: clustering and prg string production happen, and then we exit stating no variants found.

What do you think?

martinghunt commented 4 years ago

Looks good to me. Thanks for testing :)

iqbal-lab commented 4 years ago

?? isnt the natural thing to do output a PRG with no variants?

bricoletc commented 4 years ago

That is produced, but the error message comes from that we do not proceed to build an FM-Index, PRG masks and kmer index

iqbal-lab commented 4 years ago

ok, understood.