kage-genotyper / kage

Alignment-free genotyper for SNPs and short indels, implemented in Python.
GNU General Public License v3.0
45 stars 2 forks source link

erro when using kage genotype #9

Open unavailable-2374 opened 5 months ago

unavailable-2374 commented 5 months ago

Hi~

In the last days, I tried to use kage to genotype my population, but it won't work well, please help me take a look:

''' kage genotype -i 2/PN2_SNP -r data_tmp/CRR493563 -t 16 --average-coverage 25 -k 30 -o result/PN2_SNP.vcf /public/tools/python/lib/python3.10/site-packages/bionumpy/encodings/vcf_encoding.py:98: RuntimeWarning: invalid value encountered in cast _lookup[[ord(c) for c in ('0', '1', '.')]] = np.array([0, 1, np.nan]) INFO:root:Read coverage is set to 25.000 INFO:root:Reading all indexes from an index bundle INFO:root:Will count kmers. INFO:root:N bytes of reads: 31329041494 INFO:root:Approx number of chunks of 10000000 bytes: 3132 Traceback (most recent call last): File "/public/tools/python/bin/kage", line 8, in sys.exit(main()) File "/public/tools/python/lib/python3.10/site-packages/kage/command_line_interface.py", line 52, in main run_argument_parser(sys.argv[1:]) File "/public/tools/python/lib/python3.10/site-packages/kage/command_line_interface.py", line 552, in run_argument_parser args.func(args) File "/public/tools/python/lib/python3.10/site-packages/kage/command_line_interface.py", line 98, in genotype node_counts = get_kmer_counts(kmer_index, args.kmer_size, args.reads, config.n_threads, args.gpu) File "/public/tools/python/lib/python3.10/site-packages/kage/command_line_interface.py", line 58, in get_kmer_counts return NodeCounts(map_bnp(Namespace( File "/public/tools/python/lib/python3.10/site-packages/kmer_mapper/command_line_interface.py", line 105, in map_bnp file = open_file(args.reads) File "/public/tools/python/lib/python3.10/site-packages/kmer_mapper/util.py", line 80, in open_file suffix = path.suffixes[-1] IndexError: list index out of range '''

Thanks!

ivargr commented 5 months ago

Hi!

The -r parameter requires a file ending in something like .fa, .fa.gz, .fq, .fq.gz, fastq, ..etc. So it is failing because it cannot find the extension of the file data_tmp/CRR493563.

Is that your reads? If so, could you try adding the correct file extension to the file?

The error message from KAGE should have been better though, I'll fix that :)

unavailable-2374 commented 5 months ago

Hi Thanks a lot, it works! Kage is the best !!!🫡

ivargr commented 5 months ago

Great, glad to hear :)

unavailable-2374 commented 5 months ago

Hi

Sorry for bothering you again🤣, it seems like I have a question again.

''' ERROR: Found positive value in FORMAT/GL for individual [SRR5627788] at site [PN2:1358201_CAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATT_CAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACTT] [E::hts_open_format] Failed to open file "result/GLIMPSE-PN2.0.bcf" : No such file or directory '''

Maybe it comes from GLIMPES? I checked here but it didn't help. If this error really comes from GLIMPES maybe I shouldn't leave this message here😬

Best~

ivargr commented 5 months ago

This might be because of something wrong i KAGE, so good that you reported.

I think maybe there could be some roundoff error when writing genotype likelihoods so some become above zero.

I've added a check for this now that should force genotype likelihoods to not be positive. Could you try to update to version 2.0.4 and see if this helps?

I've also added some logging that shows how many genotype likelihoods are above zero, it would be interesting if this shows anything for your data (I'm not able to reproduce the problem in my tests).

This line which is printed before GLIMPSE is run should give you how many GLs are above zero.

image

unavailable-2374 commented 5 months ago

Hi

I updated it to version 2.0.4, but this error is still there. Maybe I should provide more message about this erro:

'''' Finalization:

[GLIMPSE] Ligate multiple output files into chromosome-wide files

Files:

Parameters:

Initialization:

Read filenames in [/tmp/tmpop1q2m9i]

Initilialize flags

Ligating chunks

ERROR: Failed to open file: result/GLIMPSE-PN2.0.bcf tbx_index_build failed: result/GLIMPSE.ligated.PN2.vcf.gz Checking the headers and starting positions of 1 files [E::hts_open_format] Failed to open file "result/GLIMPSE.ligated.PN2.vcf.gz" : No such file or directory Failed to open: result/GLIMPSE.ligated.PN2.vcf.gz INFO:root:Running GLIMPSE took 46.3045 sec INFO:root:Genotyping took 262 sec ''' hope it helps

Best~

ivargr commented 5 months ago

Would you be able somehow to share the kage index you are using and the reads you are using? That would make it a lot easier for me to debug this :)

unavailable-2374 commented 5 months ago

Of course, which method of file transfer are you more comfortable with?

ivargr commented 5 months ago

I'm happy with anything, but the files may be too big for email?

If email works in any way, you can email to ivargry@ifi.uio.no.

unavailable-2374 commented 4 months ago

Hi Ivargr

I got a problem again when I use Kage to genotype SVs. I attache the Erro below, wish it would be helpful.

''' Parsing specified genomic regions

ERROR: Found positive value in FORMAT/GL for individual [DONOR] at site [PN10:25987410_ATGGTGCCCGTGAGCCAAGCCAAGGTGTCACGGGGCCTTGCTTGGTGCCCGCGAGCCAAGCCAAGGTGGCACGAGGCCTTGCATGGTGCCCGCCCGCGAGCCAAGCCAAGGTGGCACGAGGCCTTGCT_ATGGTGCCCGCGAGCCAAGCCAAGGTGCCACGGGGCCTTGCTTGGTGCCCGCGAGCCAAGCCAAGGTGGCACGAGGTCTTGCATGGTGCCCGCCCGCGAGCCAAGCCAAGGTGCCACGGGGCCTTGCA] [E::hts_open_format] Failed to open file "result/CRR495123/GLIMPSE-PN10.15.bcf" : No such file or directory index: failed to open "result/CRR495123/GLIMPSE-PN10.15.bcf" ''' && ''' INFO:root:Making tmp file with names for chromosome PN10: ['result/CRR495123/GLIMPSE-PN10.0.bcf', 'result/CRR495123/GLIMPSE-PN10.1.bcf', 'result/CRR495123/GLIMPSE-PN10.2.bcf', 'result/CRR495123/GLIMPSE-PN10.3.bcf', 'result/CRR495123/GLIMPSE-PN10.4.bcf', 'result/CRR495123/GLIMPSE-PN10.5.bcf', 'result/CRR495123/GLIMPSE-PN10.6.bcf', 'result/CRR495123/GLIMPSE-PN10.7.bcf', 'result/CRR495123/GLIMPSE-PN10.8.bcf', 'result/CRR495123/GLIMPSE-PN10.9.bcf', 'result/CRR495123/GLIMPSE-PN10.10.bcf', 'result/CRR495123/GLIMPSE-PN10.11.bcf', 'result/CRR495123/GLIMPSE-PN10.12.bcf', 'result/CRR495123/GLIMPSE-PN10.13.bcf', 'result/CRR495123/GLIMPSE-PN10.14.bcf', 'result/CRR495123/GLIMPSE-PN10.15.bcf', 'result/CRR495123/GLIMPSE-PN10.16.bcf']

[GLIMPSE] Ligate multiple output files into chromosome-wide files

Files:

Parameters:

Initialization:

Read filenames in [/tmp/tmp2orgkvh3]

Initilialize flags

Ligating chunks

ERROR: Failed to open file: result/CRR495123/GLIMPSE-PN10.15.bcf tbx_index_build failed: result/CRR495123/GLIMPSE.ligated.PN10.vcf.gz '''

if there is any message need to be provide I am more than willing to provide it.

best wishes

ivargr commented 4 months ago

Thanks for sharing! Is this when genotyping SVs or SNPs? Is this using any of the data you shared with me earlier?

unavailable-2374 commented 4 months ago

Thanks for sharing! Is this when genotyping SVs or SNPs? Is this using any of the data you shared with me earlier?

It occurs when I use kage to genotype SVs. Raw data may not shared before, if you need it to make a test, you can download it from here