Open DiyaVaka opened 8 years ago
Hi DiyaVaka,
Depending on what the other genomes you want to look at are, it might work to call the sites but not to do the correct orientations of the virus. We wrote this pretty specifically for HIV.
Thanks for the reply. We will be working on the HIV too,K03455 is one of the HIV subtype B sequences, but we have many other. So I just want to check if I chose KJ849799 will that work?
Hi DiyaVaka,
Great— it should work okay. The main HIV specific bits were to try to figure out which end reads mapping to the LTR sequence should be assigned since it is identical at both ends, so as long as the subtypes are broadly similar it should be fine. Let me know if you run into any troubles.
Best,
Rory
On Jul 1, 2016, at 2:07 PM, DiyaVaka notifications@github.com wrote:
Thanks for the reply. We will be working on the HIV too,K03455 is one of the HIV subtype B sequences, but we have many other. So I just want to check if I chose KJ849799 will that work?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230012408, or mute the thread https://github.com/notifications/unsubscribe/AAZTek9dULR5_QNhqjqtIxX27MZ2XSbzks5qRVd_gaJpZM4JDV18.
Thank you Rory. Another quick question. The manual says
python orientation.py bam_file virus_contig > out.sites
So for this I will be giving the bam file produced from the chireric.py script? Also when you say virus_contig file, is this something which I need to download?
Hi DiyaVaka,
You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.
Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.
Then you can do python orientation.py your-bam-file KJ849799 > out.sites
and it will spit out the candidate integration site for every chimeric read.
Hi Rory,
I tied doing what you said and I get this error. Am I doing something wrong
(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites
Traceback (most recent call last):
File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in
Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi DiyaVaka,
You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.
Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.
Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18.
Hi Diya,
Did you run the chimeric script on the deduped BAM file? tid -1 means the read is unmapped.
On Jul 5, 2016, at 5:19 PM, DiyaVaka notifications@github.com wrote:
Hi Rory,
I tied doing what you said and I get this error. Am I doing something wrong
(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in
chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85 Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi DiyaVaka,
You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.
Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.
Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230606427, or mute the thread https://github.com/notifications/unsubscribe/AAZTeg1jHpzrsRdQwdrID486OirLn-MQks5qSsplgaJpZM4JDV18.
Yes I did and it produced a 1010_mkdup.chimeric.igv.bam file. Should I be using that instead of the 1010_mkdp.bam file?
Thanks,
From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:20 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi Diya,
Did you run the chimeric script on the deduped BAM file? tid -1 means the read is unmapped.
On Jul 5, 2016, at 5:19 PM, DiyaVaka notifications@github.com wrote:
Hi Rory,
I tied doing what you said and I get this error. Am I doing something wrong
(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in
chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85 Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi DiyaVaka,
You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.
Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.
Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230606427, or mute the thread https://github.com/notifications/unsubscribe/AAZTeg1jHpzrsRdQwdrID486OirLn-MQks5qSsplgaJpZM4JDV18.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230606785, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SQ3czcBqLHgr-p-AjekfzXqmzLWMks5qSsq6gaJpZM4JDV18.
Yup!
On Jul 5, 2016, at 5:22 PM, DiyaVaka notifications@github.com wrote:
Yes I did and it produced a 1010_mkdup.chimeric.igv.bam file. Should I be using that instead of the 1010_mkdp.bam file?
Thanks,
From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:20 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi Diya,
Did you run the chimeric script on the deduped BAM file? tid -1 means the read is unmapped.
On Jul 5, 2016, at 5:19 PM, DiyaVaka notifications@github.com wrote:
Hi Rory,
I tied doing what you said and I get this error. Am I doing something wrong
(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in
chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85 Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi DiyaVaka,
You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.
Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.
Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230606427, or mute the thread https://github.com/notifications/unsubscribe/AAZTeg1jHpzrsRdQwdrID486OirLn-MQks5qSsplgaJpZM4JDV18.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230606785, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SQ3czcBqLHgr-p-AjekfzXqmzLWMks5qSsq6gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607156, or mute the thread https://github.com/notifications/unsubscribe/AAZTelq7nvUGw0y5R7prcLRCbXrIHA3pks5qSssfgaJpZM4JDV18.
Ok. It produced an empty file with just the header row. That means the sample of our interest does not have sites for K03445? I used K03445 genome and cat that with the GRCH37.fa and run bwa and sambamba.
Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:23 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Yup!
On Jul 5, 2016, at 5:22 PM, DiyaVaka notifications@github.com wrote:
Yes I did and it produced a 1010_mkdup.chimeric.igv.bam file. Should I be using that instead of the 1010_mkdp.bam file?
Thanks,
From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:20 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi Diya,
Did you run the chimeric script on the deduped BAM file? tid -1 means the read is unmapped.
On Jul 5, 2016, at 5:19 PM, DiyaVaka notifications@github.com wrote:
Hi Rory,
I tied doing what you said and I get this error. Am I doing something wrong
(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in
chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85 Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi DiyaVaka,
You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.
Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.
Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230606427, or mute the thread https://github.com/notifications/unsubscribe/AAZTeg1jHpzrsRdQwdrID486OirLn-MQks5qSsplgaJpZM4JDV18.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230606785, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SQ3czcBqLHgr-p-AjekfzXqmzLWMks5qSsq6gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607156, or mute the thread https://github.com/notifications/unsubscribe/AAZTelq7nvUGw0y5R7prcLRCbXrIHA3pks5qSssfgaJpZM4JDV18.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230607272, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSbZzSnSrvhmAbPRCJFbQSf7taZTks5qSss7gaJpZM4JDV18.
Hi Diya,
It couldn’t find any integration sites, but that doesn’t mean the sample doesn’t have K03445 sites. Is the chimeric BAM file empty?
On Jul 5, 2016, at 5:25 PM, DiyaVaka notifications@github.com wrote:
Ok. It produced an empty file with just the header row. That means the sample of our interest does not have sites for K03445? I used K03445 genome and cat that with the GRCH37.fa and run bwa and sambamba.
Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:23 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Yup!
On Jul 5, 2016, at 5:22 PM, DiyaVaka notifications@github.com wrote:
Yes I did and it produced a 1010_mkdup.chimeric.igv.bam file. Should I be using that instead of the 1010_mkdp.bam file?
Thanks,
From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:20 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi Diya,
Did you run the chimeric script on the deduped BAM file? tid -1 means the read is unmapped.
On Jul 5, 2016, at 5:19 PM, DiyaVaka notifications@github.com wrote:
Hi Rory,
I tied doing what you said and I get this error. Am I doing something wrong
(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in
chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85 Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi DiyaVaka,
You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.
Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.
Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230606427, or mute the thread https://github.com/notifications/unsubscribe/AAZTeg1jHpzrsRdQwdrID486OirLn-MQks5qSsplgaJpZM4JDV18.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230606785, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SQ3czcBqLHgr-p-AjekfzXqmzLWMks5qSsq6gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607156, or mute the thread https://github.com/notifications/unsubscribe/AAZTelq7nvUGw0y5R7prcLRCbXrIHA3pks5qSssfgaJpZM4JDV18.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230607272, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSbZzSnSrvhmAbPRCJFbQSf7taZTks5qSss7gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607883, or mute the thread https://github.com/notifications/unsubscribe/AAZTejgmxkKqxMAU2Acv2TQ7MxFk--tQks5qSsvSgaJpZM4JDV18.
Hi Rory,
My chimeric.igv.bam file does not have any reads either. When I do samtools flagstat 1010_mkdp.chimeric.igv.bam file I get 0 reads. My 1010_mkdup.bam file does have many reads.
Why does running chimeric gives me no reads? What does this mean?
Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:27 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi Diya,
It couldn’t find any integration sites, but that doesn’t mean the sample doesn’t have K03445 sites. Is the chimeric BAM file empty?
On Jul 5, 2016, at 5:25 PM, DiyaVaka notifications@github.com wrote:
Ok. It produced an empty file with just the header row. That means the sample of our interest does not have sites for K03445? I used K03445 genome and cat that with the GRCH37.fa and run bwa and sambamba.
Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:23 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Yup!
On Jul 5, 2016, at 5:22 PM, DiyaVaka notifications@github.com wrote:
Yes I did and it produced a 1010_mkdup.chimeric.igv.bam file. Should I be using that instead of the 1010_mkdp.bam file?
Thanks,
From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:20 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi Diya,
Did you run the chimeric script on the deduped BAM file? tid -1 means the read is unmapped.
On Jul 5, 2016, at 5:19 PM, DiyaVaka notifications@github.com wrote:
Hi Rory,
I tied doing what you said and I get this error. Am I doing something wrong
(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in
chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85 Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi DiyaVaka,
You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.
Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.
Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230606427, or mute the thread https://github.com/notifications/unsubscribe/AAZTeg1jHpzrsRdQwdrID486OirLn-MQks5qSsplgaJpZM4JDV18.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230606785, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SQ3czcBqLHgr-p-AjekfzXqmzLWMks5qSsq6gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607156, or mute the thread https://github.com/notifications/unsubscribe/AAZTelq7nvUGw0y5R7prcLRCbXrIHA3pks5qSssfgaJpZM4JDV18.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230607272, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSbZzSnSrvhmAbPRCJFbQSf7taZTks5qSss7gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607883, or mute the thread https://github.com/notifications/unsubscribe/AAZTejgmxkKqxMAU2Acv2TQ7MxFk--tQks5qSsvSgaJpZM4JDV18.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230608296, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SWGvT6pcYJ2nQGnBEuIVaHRk5l0Kks5qSsw7gaJpZM4JDV18.
Hi Diya,
It could mean that there are no chimeric reads but it could be a technical issue as well. I think ruling out the technical issue is important before drawing and conclusions. Do you think you could put up the 1010_mkdup.bam file for me to look at somewhere?
Best,
Rory
On Jul 5, 2016, at 5:46 PM, DiyaVaka notifications@github.com wrote:
Hi Rory,
My chimeric.igv.bam file does not have any reads either. When I do samtools flagstat 1010_mkdp.chimeric.igv.bam file I get 0 reads. My 1010_mkdup.bam file does have many reads.
Why does running chimeric gives me no reads? What does this mean?
Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:27 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi Diya,
It couldn’t find any integration sites, but that doesn’t mean the sample doesn’t have K03445 sites. Is the chimeric BAM file empty?
On Jul 5, 2016, at 5:25 PM, DiyaVaka notifications@github.com wrote:
Ok. It produced an empty file with just the header row. That means the sample of our interest does not have sites for K03445? I used K03445 genome and cat that with the GRCH37.fa and run bwa and sambamba.
Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:23 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Yup!
On Jul 5, 2016, at 5:22 PM, DiyaVaka notifications@github.com wrote:
Yes I did and it produced a 1010_mkdup.chimeric.igv.bam file. Should I be using that instead of the 1010_mkdp.bam file?
Thanks,
From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:20 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi Diya,
Did you run the chimeric script on the deduped BAM file? tid -1 means the read is unmapped.
On Jul 5, 2016, at 5:19 PM, DiyaVaka notifications@github.com wrote:
Hi Rory,
I tied doing what you said and I get this error. Am I doing something wrong
(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in
chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85 Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi DiyaVaka,
You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.
Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.
Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230606427, or mute the thread https://github.com/notifications/unsubscribe/AAZTeg1jHpzrsRdQwdrID486OirLn-MQks5qSsplgaJpZM4JDV18.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230606785, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SQ3czcBqLHgr-p-AjekfzXqmzLWMks5qSsq6gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607156, or mute the thread https://github.com/notifications/unsubscribe/AAZTelq7nvUGw0y5R7prcLRCbXrIHA3pks5qSssfgaJpZM4JDV18.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230607272, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSbZzSnSrvhmAbPRCJFbQSf7taZTks5qSss7gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607883, or mute the thread https://github.com/notifications/unsubscribe/AAZTejgmxkKqxMAU2Acv2TQ7MxFk--tQks5qSsvSgaJpZM4JDV18.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230608296, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SWGvT6pcYJ2nQGnBEuIVaHRk5l0Kks5qSsw7gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230612736, or mute the thread https://github.com/notifications/unsubscribe/AAZTevQQwPMcKbaX5aGf0ok7CNTVPVZ4ks5qStC7gaJpZM4JDV18.
Hi Rory,
I shared the file in google drive. Let me know if you cannot access it.
Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 3:58 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi Diya,
It could mean that there are no chimeric reads but it could be a technical issue as well. I think ruling out the technical issue is important before drawing and conclusions. Do you think you could put up the 1010_mkdup.bam file for me to look at somewhere?
Best,
Rory
On Jul 5, 2016, at 5:46 PM, DiyaVaka notifications@github.com wrote:
Hi Rory,
My chimeric.igv.bam file does not have any reads either. When I do samtools flagstat 1010_mkdp.chimeric.igv.bam file I get 0 reads. My 1010_mkdup.bam file does have many reads.
Why does running chimeric gives me no reads? What does this mean?
Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:27 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi Diya,
It couldn’t find any integration sites, but that doesn’t mean the sample doesn’t have K03445 sites. Is the chimeric BAM file empty?
On Jul 5, 2016, at 5:25 PM, DiyaVaka notifications@github.com wrote:
Ok. It produced an empty file with just the header row. That means the sample of our interest does not have sites for K03445? I used K03445 genome and cat that with the GRCH37.fa and run bwa and sambamba.
Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:23 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Yup!
On Jul 5, 2016, at 5:22 PM, DiyaVaka notifications@github.com wrote:
Yes I did and it produced a 1010_mkdup.chimeric.igv.bam file. Should I be using that instead of the 1010_mkdp.bam file?
Thanks,
From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:20 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi Diya,
Did you run the chimeric script on the deduped BAM file? tid -1 means the read is unmapped.
On Jul 5, 2016, at 5:19 PM, DiyaVaka notifications@github.com wrote:
Hi Rory,
I tied doing what you said and I get this error. Am I doing something wrong
(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in
chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85 Thanks,
Diya
From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi DiyaVaka,
You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.
Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.
Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230606427, or mute the thread https://github.com/notifications/unsubscribe/AAZTeg1jHpzrsRdQwdrID486OirLn-MQks5qSsplgaJpZM4JDV18.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230606785, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SQ3czcBqLHgr-p-AjekfzXqmzLWMks5qSsq6gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607156, or mute the thread https://github.com/notifications/unsubscribe/AAZTelq7nvUGw0y5R7prcLRCbXrIHA3pks5qSssfgaJpZM4JDV18.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230607272, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSbZzSnSrvhmAbPRCJFbQSf7taZTks5qSss7gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607883, or mute the thread https://github.com/notifications/unsubscribe/AAZTejgmxkKqxMAU2Acv2TQ7MxFk--tQks5qSsvSgaJpZM4JDV18.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230608296, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SWGvT6pcYJ2nQGnBEuIVaHRk5l0Kks5qSsw7gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230612736, or mute the thread https://github.com/notifications/unsubscribe/AAZTevQQwPMcKbaX5aGf0ok7CNTVPVZ4ks5qStC7gaJpZM4JDV18.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230627158, or mute the threadhttps://github.com/notifications/unsubscribe/AE89STLeNUkenwkgULUvb5gJ7TkVIav6ks5qSuGlgaJpZM4JDV18.
Hi Diya,
I think I need the link to look on Google Drive. Thanks!
Attached is the link
https://drive.google.com/file/d/0BywzvSTpUvWvOXUxNFJ4NHk1ZjQ/view?usp=sharing
Diya Vaka
Bioinformatics Programmer Analyst
Institute of Human Genetics
513 Parnassus Ave., HSE S966
San Franscisco, CA 94143
W 415-502-3570
dedeepya.vaka@ucsf.edu
From: Rory Kirchner [notifications@github.com] Sent: Wednesday, July 06, 2016 10:33 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi Diya,
I think I need the link to look on Google Drive. Thanks!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230846384, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SWYtuExD-b6S8upQkV3IUCHdUNkmks5qS-bjgaJpZM4JDV18.
Hi Diya,
The name for the sequence you provided isn't KJ849799 it is gi|1906382|gb|K03455.1|HIVHXB2CG. When you pass KJ849799 to the chimeric.py script is looking for reads that align to KJ849799 and one of the human chromosomes. So you'll either have to rename gi|1906382|gb|K03455.1|HIVHXB2CG to KJ849799 in your FASTA file or try passing gi|1906382|gb|K03455.1|HIVHXB2CG to the chimeric.py script. The former is preferable though, I think the |
characters might break some downstream things.
Hi Rory,
I did use K03445, as it is the first subtype we wanted to use. So I tried passing K03445 to chimeric script, still it not pick anything. I will try to pass gi|1906382|gb|K03455.1|HIVHXB2CG to the chineric script and see what happens.
Thanks,
Diya Vaka
Bioinformatics Programmer Analyst
Institute of Human Genetics
513 Parnassus Ave., HSE S966
San Franscisco, CA 94143
W 415-502-3570
dedeepya.vaka@ucsf.edu
From: Rory Kirchner [notifications@github.com] Sent: Wednesday, July 06, 2016 10:54 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi Diya,
The name for the sequence you provided isn't KJ849799 it is gi|1906382|gb|K03455.1|HIVHXB2CG. When you pass KJ849799 to the chimeric.py script is looking for reads that align to KJ849799 and one of the human chromosomes. So you'll either have to rename gi|1906382|gb|K03455.1|HIVHXB2CG to KJ849799 in your FASTA file or try passing gi|1906382|gb|K03455.1|HIVHXB2CG to the chimeric.py script. The former is preferable though, I think the | characters might break some downstream things.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230852598, or mute the threadhttps://github.com/notifications/unsubscribe/AE89Sc0AhLvfTtHzrRUHsl49XdJOrXonks5qS-vugaJpZM4JDV18.
Hi Rory,
Also in the manual you say
python chimeric.py smaple.bam K03455. But that's not how it works. You need to give the K03455 first and bam file latter. This is what I used
python chimeric.py HIVHXB2CG 1010_mkdup.bam
Thanks,
Diya Vaka
Bioinformatics Programmer Analyst
Institute of Human Genetics
513 Parnassus Ave., HSE S966
San Franscisco, CA 94143
W 415-502-3570
dedeepya.vaka@ucsf.edu
From: Vaka, Dedeepya Sent: Wednesday, July 06, 2016 11:07 AM To: hbc/li_hiv; hbc/li_hiv Cc: Author Subject: RE: [hbc/li_hiv] Multiple virus genomes (#2)
Hi Rory,
I did use K03445, as it is the first subtype we wanted to use. So I tried passing K03445 to chimeric script, still it not pick anything. I will try to pass gi|1906382|gb|K03455.1|HIVHXB2CG to the chineric script and see what happens.
Thanks,
Diya Vaka
Bioinformatics Programmer Analyst
Institute of Human Genetics
513 Parnassus Ave., HSE S966
San Franscisco, CA 94143
W 415-502-3570
dedeepya.vaka@ucsf.edu
From: Rory Kirchner [notifications@github.com] Sent: Wednesday, July 06, 2016 10:54 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi Diya,
The name for the sequence you provided isn't KJ849799 it is gi|1906382|gb|K03455.1|HIVHXB2CG. When you pass KJ849799 to the chimeric.py script is looking for reads that align to KJ849799 and one of the human chromosomes. So you'll either have to rename gi|1906382|gb|K03455.1|HIVHXB2CG to KJ849799 in your FASTA file or try passing gi|1906382|gb|K03455.1|HIVHXB2CG to the chimeric.py script. The former is preferable though, I think the | characters might break some downstream things.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230852598, or mute the threadhttps://github.com/notifications/unsubscribe/AE89Sc0AhLvfTtHzrRUHsl49XdJOrXonks5qS-vugaJpZM4JDV18.
Hi Diya,
Great-- thanks for catching the error in the chimeric.py
documentation. I updated it to be correct.
No problem. So does it mean that our sample does not have any sites for K03455?
Thanks,
Diya Vaka
Bioinformatics Programmer Analyst
Institute of Human Genetics
513 Parnassus Ave., HSE S966
San Franscisco, CA 94143
W 415-502-3570
dedeepya.vaka@ucsf.edu
From: Rory Kirchner [notifications@github.com] Sent: Wednesday, July 06, 2016 12:24 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)
Hi Diya,
Great-- thanks for catching the error in the chimeric.py documentation. I updated it to be correct.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230879728, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SVuz8UfX3e3yMu7us3dTarl3Bdzeks5qTAD5gaJpZM4JDV18.
Hi Diya,
It isn't clear yet. Right now it is not finding chimeric reads because of the name.
Here is the header for the BAM file:
@HD VN:1.3
@SQ SN:1 LN:249250621
@SQ SN:2 LN:243199373
@SQ SN:3 LN:198022430
@SQ SN:4 LN:191154276
@SQ SN:5 LN:180915260
@SQ SN:6 LN:171115067
@SQ SN:7 LN:159138663
@SQ SN:8 LN:146364022
@SQ SN:9 LN:141213431
@SQ SN:10 LN:135534747
@SQ SN:11 LN:135006516
@SQ SN:12 LN:133851895
@SQ SN:13 LN:115169878
@SQ SN:14 LN:107349540
@SQ SN:15 LN:102531392
@SQ SN:16 LN:90354753
@SQ SN:17 LN:81195210
@SQ SN:18 LN:78077248
@SQ SN:19 LN:59128983
@SQ SN:20 LN:63025520
@SQ SN:21 LN:48129895
@SQ SN:22 LN:51304566
@SQ SN:X LN:155270560
@SQ SN:Y LN:59373566
@SQ SN:MT LN:16569
@SQ SN:GL000207.1 LN:4262
@SQ SN:GL000226.1 LN:15008
@SQ SN:GL000229.1 LN:19913
@SQ SN:GL000231.1 LN:27386
@SQ SN:GL000210.1 LN:27682
@SQ SN:GL000239.1 LN:33824
@SQ SN:GL000235.1 LN:34474
@SQ SN:GL000201.1 LN:36148
@SQ SN:GL000247.1 LN:36422
@SQ SN:GL000245.1 LN:36651
@SQ SN:GL000197.1 LN:37175
@SQ SN:GL000203.1 LN:37498
@SQ SN:GL000246.1 LN:38154
@SQ SN:GL000249.1 LN:38502
@SQ SN:GL000196.1 LN:38914
@SQ SN:GL000248.1 LN:39786
@SQ SN:GL000244.1 LN:39929
@SQ SN:GL000238.1 LN:39939
@SQ SN:GL000202.1 LN:40103
@SQ SN:GL000234.1 LN:40531
@SQ SN:GL000232.1 LN:40652
@SQ SN:GL000206.1 LN:41001
@SQ SN:GL000240.1 LN:41933
@SQ SN:GL000236.1 LN:41934
@SQ SN:GL000241.1 LN:42152
@SQ SN:GL000243.1 LN:43341
@SQ SN:GL000242.1 LN:43523
@SQ SN:GL000230.1 LN:43691
@SQ SN:GL000237.1 LN:45867
@SQ SN:GL000233.1 LN:45941
@SQ SN:GL000204.1 LN:81310
@SQ SN:GL000198.1 LN:90085
@SQ SN:GL000208.1 LN:92689
@SQ SN:GL000191.1 LN:106433
@SQ SN:GL000227.1 LN:128374
@SQ SN:GL000228.1 LN:129120
@SQ SN:GL000214.1 LN:137718
@SQ SN:GL000221.1 LN:155397
@SQ SN:GL000209.1 LN:159169
@SQ SN:GL000218.1 LN:161147
@SQ SN:GL000220.1 LN:161802
@SQ SN:GL000213.1 LN:164239
@SQ SN:GL000211.1 LN:166566
@SQ SN:GL000199.1 LN:169874
@SQ SN:GL000217.1 LN:172149
@SQ SN:GL000216.1 LN:172294
@SQ SN:GL000215.1 LN:172545
@SQ SN:GL000205.1 LN:174588
@SQ SN:GL000219.1 LN:179198
@SQ SN:GL000224.1 LN:179693
@SQ SN:GL000223.1 LN:180455
@SQ SN:GL000195.1 LN:182896
@SQ SN:GL000212.1 LN:186858
@SQ SN:GL000222.1 LN:186861
@SQ SN:GL000200.1 LN:187035
@SQ SN:GL000193.1 LN:189789
@SQ SN:GL000194.1 LN:191469
@SQ SN:GL000225.1 LN:211173
@SQ SN:GL000192.1 LN:547496
@SQ SN:gi|1906382|gb|K03455.1|HIVHXB2CG LN:9719
chimeric.py
is looking for K03455
when that is passed that as an argument, but it doesn't exist in the BAM file header. The only non-human contig in that BAM file is named gi|1906382|gb|K03455.1|HIVHXB2CG
not K03455
. So for that BAM file, passing gi|1906382|gb|K03455.1|HIVHXB2CG
instead of K03455
will look for reads chimeric with the non-human contig. The problem is gi|1906382|gb|K03455.1|HIVHXB2CG
has some characters in it that might mess up some of the parsing, so I'd rename gi|1906382|gb|K03455.1|HIVHXB2CG
to K03455
and rerun the alignment and then pass in K03455
.
Hi Team,
We are interested to use the li_hiv package for our dataset. I see that "In this example we used K03455 as the virus genome", but we want to use many other genomes. Should I download other genomes that I am interested in?
Thanks,