mehrdadbakhtiari / adVNTR

A tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data
http://advntr.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
41 stars 15 forks source link

only None copy numbers in the output file #64

Open maryamghr opened 1 year ago

maryamghr commented 1 year ago

I tried adVNTR on PacBio data and the output bed file has only None values in R1 and R2 columns. It basically does not genotype any tandem repeat region.

sara-javadzadeh commented 1 year ago

Hi Maryam,

Thanks for bringing this up. This is not the expected behavior for AdVNTR genotype command on PacBio data. So I expect that there was an error. Could you please share the following?

  1. What is the exact advntr command you used (including all the flags)?
  2. Do you see any error messages in the .log file? If so, could you please share them?
  3. Could you please share a bit about the input data? Is it mapped reads? What is the fold coverage? Did you confirm that the dataset includes spanning reads for the targeted VNTR loci?

Sara

maryamghr commented 1 year ago

Hi Sara,

Thanks for following up.

  1. This is the advntr command that I run:

    advntr genotype --alignment_file $my_pacbio_bam_file --working_directory log_dir 
        --pacbio -m vntr_data/hg38_selected_VNTRs_Illumina.db 
        --outfmt bed -t 30 
  2. There is no error message, but the main lines in the log file are like this for each vntr region:

    INFO:extract_unmapped_reads_to_fasta_file executed in 0.000037s
    INFO:get_filtered_read_ids executed in 0.747395s
    INFO:get_vntr_filtered_reads_map executed in 0.747683s
    DEBUG:finding repeat count from pacbio alignment file for 201
    INFO:length_distribution of unmapped spanning reads: []
    DEBUG:no reference positions for read. skipping self.check_if_pacbio_mapped_read_spans_vntr for this read
    ...
    DEBUG:no reference positions for read. skipping self.check_if_pacbio_mapped_read_spans_vntr for this read
    INFO:length_distribution of mapped spanning reads: []
    INFO:get_spanning_reads_of_aligned_pacbio_reads executed in 0.171241s
    INFO:There is no spanning read
    INFO:find_repeat_count_from_pacbio_alignment_file executed in 0.208687s
  3. I am using HG002 bam file (PacBio CLR reads aligned with minimap2) from giab. It is a whole-genome alignment file, and it has indeed coverage in VNTR regions. The coverage in the first VNTR region is ~40x for example.

Bests, Maryam

sara-javadzadeh commented 1 year ago

Hi Maryam,

Thanks for sharing the info. I'm working on it.

Sara

On Thu, Aug 31, 2023 at 2:31 AM maryamghr @.***> wrote:

Hi Sara,

Thanks for following up.

  1. This is the advntr command that I run:

advntr genotype --alignment_file $my_pacbio_bam_file --working_directory log_dir --pacbio -m vntr_data/hg38_selected_VNTRs_Illumina.db --outfmt bed -t 30

  1. There is no error message, but the main lines in the log file are like this for each vntr region:

INFO:extract_unmapped_reads_to_fasta_file executed in 0.000037s INFO:get_filtered_read_ids executed in 0.747395s INFO:get_vntr_filtered_reads_map executed in 0.747683s DEBUG:finding repeat count from pacbio alignment file for 201 INFO:length_distribution of unmapped spanning reads: [] DEBUG:no reference positions for read. skipping self.check_if_pacbio_mapped_read_spans_vntr for this read ... DEBUG:no reference positions for read. skipping self.check_if_pacbio_mapped_read_spans_vntr for this read INFO:length_distribution of mapped spanning reads: [] INFO:get_spanning_reads_of_aligned_pacbio_reads executed in 0.171241s INFO:There is no spanning read INFO:find_repeat_count_from_pacbio_alignment_file executed in 0.208687s

  1. I am using HG002 bam file (PacBio CLR reads aligned with minimap2) from giab. It is a whole-genome alignment file, and it has indeed coverage in VNTR regions. The coverage in the first VNTR region is ~40x for example.

Bests, Maryam

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/mehrdadbakhtiari/adVNTR/issues/64*issuecomment-1700691483__;Iw!!Mih3wA!GwYl_Njdo_6wGBk3fh63y-0WAQfrh4_LOHiK8OXssSPQSSJTRy1fplmquGBMky1C-U1JGCiXFgkrvVWYFUNnQL18bTXcAT9g$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AOGKYDXKQND3SUUJ7EMTWQ3XYBKVDANCNFSM6AAAAAA4C6AUQE__;!!Mih3wA!GwYl_Njdo_6wGBk3fh63y-0WAQfrh4_LOHiK8OXssSPQSSJTRy1fplmquGBMky1C-U1JGCiXFgkrvVWYFUNnQL18bUdHFvUF$ . You are receiving this because you commented.Message ID: @.***>

maryamghr commented 1 year ago

Salam Sara,

Thank you for working on it! I would like to ask another question in the meantime. Is there any way to use advntr to genotype vntrs given a custom set of regions that are given as a bed file (with an optional column specifying the motif)? So far, I only saw the option for adding a custom vntr region, which needs genomic coordinates of a single region. I want to run it on a bed file including my regions of interest (if possible).

Bests, Maryam

Xiaodiao1111 commented 1 year ago

Hi Sara,

Thanks for following up.

  1. This is the advntr command that I run:
advntr genotype --alignment_file $my_pacbio_bam_file --working_directory log_dir 
        --pacbio -m vntr_data/hg38_selected_VNTRs_Illumina.db 
        --outfmt bed -t 30 
  1. There is no error message, but the main lines in the log file are like this for each vntr region:
INFO:extract_unmapped_reads_to_fasta_file executed in 0.000037s
INFO:get_filtered_read_ids executed in 0.747395s
INFO:get_vntr_filtered_reads_map executed in 0.747683s
DEBUG:finding repeat count from pacbio alignment file for 201
INFO:length_distribution of unmapped spanning reads: []
DEBUG:no reference positions for read. skipping self.check_if_pacbio_mapped_read_spans_vntr for this read
...
DEBUG:no reference positions for read. skipping self.check_if_pacbio_mapped_read_spans_vntr for this read
INFO:length_distribution of mapped spanning reads: []
INFO:get_spanning_reads_of_aligned_pacbio_reads executed in 0.171241s
INFO:There is no spanning read
INFO:find_repeat_count_from_pacbio_alignment_file executed in 0.208687s
  1. I am using HG002 bam file (PacBio CLR reads aligned with minimap2) from giab. It is a whole-genome alignment file, and it has indeed coverage in VNTR regions. The coverage in the first VNTR region is ~40x for example.

Bests, Maryam

Hello, have you solved your problem? How to solve it. I had a similar problem.