hbc / li_hiv

HIV integration project
6 stars 8 forks source link

Multiple virus genomes #2

Open DiyaVaka opened 8 years ago

DiyaVaka commented 8 years ago

Hi Team,

We are interested to use the li_hiv package for our dataset. I see that "In this example we used K03455 as the virus genome", but we want to use many other genomes. Should I download other genomes that I am interested in?

Thanks,

roryk commented 8 years ago

Hi DiyaVaka,

Depending on what the other genomes you want to look at are, it might work to call the sites but not to do the correct orientations of the virus. We wrote this pretty specifically for HIV.

DiyaVaka commented 8 years ago

Thanks for the reply. We will be working on the HIV too,K03455 is one of the HIV subtype B sequences, but we have many other. So I just want to check if I chose KJ849799 will that work?

roryk commented 8 years ago

Hi DiyaVaka,

Great— it should work okay. The main HIV specific bits were to try to figure out which end reads mapping to the LTR sequence should be assigned since it is identical at both ends, so as long as the subtypes are broadly similar it should be fine. Let me know if you run into any troubles.

Best,

Rory

On Jul 1, 2016, at 2:07 PM, DiyaVaka notifications@github.com wrote:

Thanks for the reply. We will be working on the HIV too,K03455 is one of the HIV subtype B sequences, but we have many other. So I just want to check if I chose KJ849799 will that work?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230012408, or mute the thread https://github.com/notifications/unsubscribe/AAZTek9dULR5_QNhqjqtIxX27MZ2XSbzks5qRVd_gaJpZM4JDV18.

DiyaVaka commented 8 years ago

Thank you Rory. Another quick question. The manual says

python orientation.py bam_file virus_contig > out.sites

So for this I will be giving the bam file produced from the chireric.py script? Also when you say virus_contig file, is this something which I need to download?

roryk commented 8 years ago

Hi DiyaVaka,

You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.

Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.

Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.

DiyaVaka commented 8 years ago

Hi Rory,

I tied doing what you said and I get this error. Am I doing something wrong

(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi DiyaVaka,

You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.

Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.

Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18.

roryk commented 8 years ago

Hi Diya,

Did you run the chimeric script on the deduped BAM file? tid -1 means the read is unmapped.

On Jul 5, 2016, at 5:19 PM, DiyaVaka notifications@github.com wrote:

Hi Rory,

I tied doing what you said and I get this error. Am I doing something wrong

(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi DiyaVaka,

You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.

Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.

Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230606427, or mute the thread https://github.com/notifications/unsubscribe/AAZTeg1jHpzrsRdQwdrID486OirLn-MQks5qSsplgaJpZM4JDV18.

DiyaVaka commented 8 years ago

Yes I did and it produced a 1010_mkdup.chimeric.igv.bam file. Should I be using that instead of the 1010_mkdp.bam file?

Thanks,


From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:20 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi Diya,

Did you run the chimeric script on the deduped BAM file? tid -1 means the read is unmapped.

On Jul 5, 2016, at 5:19 PM, DiyaVaka notifications@github.com wrote:

Hi Rory,

I tied doing what you said and I get this error. Am I doing something wrong

(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi DiyaVaka,

You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.

Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.

Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230606427, or mute the thread https://github.com/notifications/unsubscribe/AAZTeg1jHpzrsRdQwdrID486OirLn-MQks5qSsplgaJpZM4JDV18.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230606785, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SQ3czcBqLHgr-p-AjekfzXqmzLWMks5qSsq6gaJpZM4JDV18.

roryk commented 8 years ago

Yup!

On Jul 5, 2016, at 5:22 PM, DiyaVaka notifications@github.com wrote:

Yes I did and it produced a 1010_mkdup.chimeric.igv.bam file. Should I be using that instead of the 1010_mkdp.bam file?

Thanks,


From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:20 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi Diya,

Did you run the chimeric script on the deduped BAM file? tid -1 means the read is unmapped.

On Jul 5, 2016, at 5:19 PM, DiyaVaka notifications@github.com wrote:

Hi Rory,

I tied doing what you said and I get this error. Am I doing something wrong

(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi DiyaVaka,

You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.

Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.

Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230606427, or mute the thread https://github.com/notifications/unsubscribe/AAZTeg1jHpzrsRdQwdrID486OirLn-MQks5qSsplgaJpZM4JDV18.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230606785, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SQ3czcBqLHgr-p-AjekfzXqmzLWMks5qSsq6gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607156, or mute the thread https://github.com/notifications/unsubscribe/AAZTelq7nvUGw0y5R7prcLRCbXrIHA3pks5qSssfgaJpZM4JDV18.

DiyaVaka commented 8 years ago

Ok. It produced an empty file with just the header row. That means the sample of our interest does not have sites for K03445? I used K03445 genome and cat that with the GRCH37.fa and run bwa and sambamba.

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:23 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Yup!

On Jul 5, 2016, at 5:22 PM, DiyaVaka notifications@github.com wrote:

Yes I did and it produced a 1010_mkdup.chimeric.igv.bam file. Should I be using that instead of the 1010_mkdp.bam file?

Thanks,


From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:20 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi Diya,

Did you run the chimeric script on the deduped BAM file? tid -1 means the read is unmapped.

On Jul 5, 2016, at 5:19 PM, DiyaVaka notifications@github.com wrote:

Hi Rory,

I tied doing what you said and I get this error. Am I doing something wrong

(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi DiyaVaka,

You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.

Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.

Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230606427, or mute the thread https://github.com/notifications/unsubscribe/AAZTeg1jHpzrsRdQwdrID486OirLn-MQks5qSsplgaJpZM4JDV18.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230606785, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SQ3czcBqLHgr-p-AjekfzXqmzLWMks5qSsq6gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607156, or mute the thread https://github.com/notifications/unsubscribe/AAZTelq7nvUGw0y5R7prcLRCbXrIHA3pks5qSssfgaJpZM4JDV18.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230607272, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSbZzSnSrvhmAbPRCJFbQSf7taZTks5qSss7gaJpZM4JDV18.

roryk commented 8 years ago

Hi Diya,

It couldn’t find any integration sites, but that doesn’t mean the sample doesn’t have K03445 sites. Is the chimeric BAM file empty?

On Jul 5, 2016, at 5:25 PM, DiyaVaka notifications@github.com wrote:

Ok. It produced an empty file with just the header row. That means the sample of our interest does not have sites for K03445? I used K03445 genome and cat that with the GRCH37.fa and run bwa and sambamba.

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:23 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Yup!

On Jul 5, 2016, at 5:22 PM, DiyaVaka notifications@github.com wrote:

Yes I did and it produced a 1010_mkdup.chimeric.igv.bam file. Should I be using that instead of the 1010_mkdp.bam file?

Thanks,


From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:20 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi Diya,

Did you run the chimeric script on the deduped BAM file? tid -1 means the read is unmapped.

On Jul 5, 2016, at 5:19 PM, DiyaVaka notifications@github.com wrote:

Hi Rory,

I tied doing what you said and I get this error. Am I doing something wrong

(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi DiyaVaka,

You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.

Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.

Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230606427, or mute the thread https://github.com/notifications/unsubscribe/AAZTeg1jHpzrsRdQwdrID486OirLn-MQks5qSsplgaJpZM4JDV18.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230606785, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SQ3czcBqLHgr-p-AjekfzXqmzLWMks5qSsq6gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607156, or mute the thread https://github.com/notifications/unsubscribe/AAZTelq7nvUGw0y5R7prcLRCbXrIHA3pks5qSssfgaJpZM4JDV18.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230607272, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSbZzSnSrvhmAbPRCJFbQSf7taZTks5qSss7gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607883, or mute the thread https://github.com/notifications/unsubscribe/AAZTejgmxkKqxMAU2Acv2TQ7MxFk--tQks5qSsvSgaJpZM4JDV18.

DiyaVaka commented 8 years ago

Hi Rory,

My chimeric.igv.bam file does not have any reads either. When I do samtools flagstat 1010_mkdp.chimeric.igv.bam file I get 0 reads. My 1010_mkdup.bam file does have many reads.

Why does running chimeric gives me no reads? What does this mean?

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:27 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi Diya,

It couldn’t find any integration sites, but that doesn’t mean the sample doesn’t have K03445 sites. Is the chimeric BAM file empty?

On Jul 5, 2016, at 5:25 PM, DiyaVaka notifications@github.com wrote:

Ok. It produced an empty file with just the header row. That means the sample of our interest does not have sites for K03445? I used K03445 genome and cat that with the GRCH37.fa and run bwa and sambamba.

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:23 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Yup!

On Jul 5, 2016, at 5:22 PM, DiyaVaka notifications@github.com wrote:

Yes I did and it produced a 1010_mkdup.chimeric.igv.bam file. Should I be using that instead of the 1010_mkdp.bam file?

Thanks,


From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:20 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi Diya,

Did you run the chimeric script on the deduped BAM file? tid -1 means the read is unmapped.

On Jul 5, 2016, at 5:19 PM, DiyaVaka notifications@github.com wrote:

Hi Rory,

I tied doing what you said and I get this error. Am I doing something wrong

(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi DiyaVaka,

You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.

Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.

Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230606427, or mute the thread https://github.com/notifications/unsubscribe/AAZTeg1jHpzrsRdQwdrID486OirLn-MQks5qSsplgaJpZM4JDV18.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230606785, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SQ3czcBqLHgr-p-AjekfzXqmzLWMks5qSsq6gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607156, or mute the thread https://github.com/notifications/unsubscribe/AAZTelq7nvUGw0y5R7prcLRCbXrIHA3pks5qSssfgaJpZM4JDV18.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230607272, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSbZzSnSrvhmAbPRCJFbQSf7taZTks5qSss7gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607883, or mute the thread https://github.com/notifications/unsubscribe/AAZTejgmxkKqxMAU2Acv2TQ7MxFk--tQks5qSsvSgaJpZM4JDV18.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230608296, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SWGvT6pcYJ2nQGnBEuIVaHRk5l0Kks5qSsw7gaJpZM4JDV18.

roryk commented 8 years ago

Hi Diya,

It could mean that there are no chimeric reads but it could be a technical issue as well. I think ruling out the technical issue is important before drawing and conclusions. Do you think you could put up the 1010_mkdup.bam file for me to look at somewhere?

Best,

Rory

On Jul 5, 2016, at 5:46 PM, DiyaVaka notifications@github.com wrote:

Hi Rory,

My chimeric.igv.bam file does not have any reads either. When I do samtools flagstat 1010_mkdp.chimeric.igv.bam file I get 0 reads. My 1010_mkdup.bam file does have many reads.

Why does running chimeric gives me no reads? What does this mean?

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:27 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi Diya,

It couldn’t find any integration sites, but that doesn’t mean the sample doesn’t have K03445 sites. Is the chimeric BAM file empty?

On Jul 5, 2016, at 5:25 PM, DiyaVaka notifications@github.com wrote:

Ok. It produced an empty file with just the header row. That means the sample of our interest does not have sites for K03445? I used K03445 genome and cat that with the GRCH37.fa and run bwa and sambamba.

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:23 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Yup!

On Jul 5, 2016, at 5:22 PM, DiyaVaka notifications@github.com wrote:

Yes I did and it produced a 1010_mkdup.chimeric.igv.bam file. Should I be using that instead of the 1010_mkdp.bam file?

Thanks,


From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:20 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi Diya,

Did you run the chimeric script on the deduped BAM file? tid -1 means the read is unmapped.

On Jul 5, 2016, at 5:19 PM, DiyaVaka notifications@github.com wrote:

Hi Rory,

I tied doing what you said and I get this error. Am I doing something wrong

(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi DiyaVaka,

You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.

Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.

Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230606427, or mute the thread https://github.com/notifications/unsubscribe/AAZTeg1jHpzrsRdQwdrID486OirLn-MQks5qSsplgaJpZM4JDV18.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230606785, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SQ3czcBqLHgr-p-AjekfzXqmzLWMks5qSsq6gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607156, or mute the thread https://github.com/notifications/unsubscribe/AAZTelq7nvUGw0y5R7prcLRCbXrIHA3pks5qSssfgaJpZM4JDV18.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230607272, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSbZzSnSrvhmAbPRCJFbQSf7taZTks5qSss7gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607883, or mute the thread https://github.com/notifications/unsubscribe/AAZTejgmxkKqxMAU2Acv2TQ7MxFk--tQks5qSsvSgaJpZM4JDV18.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230608296, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SWGvT6pcYJ2nQGnBEuIVaHRk5l0Kks5qSsw7gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230612736, or mute the thread https://github.com/notifications/unsubscribe/AAZTevQQwPMcKbaX5aGf0ok7CNTVPVZ4ks5qStC7gaJpZM4JDV18.

DiyaVaka commented 8 years ago

Hi Rory,

I shared the file in google drive. Let me know if you cannot access it.

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 3:58 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi Diya,

It could mean that there are no chimeric reads but it could be a technical issue as well. I think ruling out the technical issue is important before drawing and conclusions. Do you think you could put up the 1010_mkdup.bam file for me to look at somewhere?

Best,

Rory

On Jul 5, 2016, at 5:46 PM, DiyaVaka notifications@github.com wrote:

Hi Rory,

My chimeric.igv.bam file does not have any reads either. When I do samtools flagstat 1010_mkdp.chimeric.igv.bam file I get 0 reads. My 1010_mkdup.bam file does have many reads.

Why does running chimeric gives me no reads? What does this mean?

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:27 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi Diya,

It couldn’t find any integration sites, but that doesn’t mean the sample doesn’t have K03445 sites. Is the chimeric BAM file empty?

On Jul 5, 2016, at 5:25 PM, DiyaVaka notifications@github.com wrote:

Ok. It produced an empty file with just the header row. That means the sample of our interest does not have sites for K03445? I used K03445 genome and cat that with the GRCH37.fa and run bwa and sambamba.

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:23 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Yup!

On Jul 5, 2016, at 5:22 PM, DiyaVaka notifications@github.com wrote:

Yes I did and it produced a 1010_mkdup.chimeric.igv.bam file. Should I be using that instead of the 1010_mkdp.bam file?

Thanks,


From: Rory Kirchner [notifications@github.com] Sent: Tuesday, July 05, 2016 2:20 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi Diya,

Did you run the chimeric script on the deduped BAM file? tid -1 means the read is unmapped.

On Jul 5, 2016, at 5:19 PM, DiyaVaka notifications@github.com wrote:

Hi Rory,

I tied doing what you said and I get this error. Am I doing something wrong

(data_management_env)[sequencing@ihg-node-42 Sample_1010]$ python /home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py 1010_mkdp.bam K03445 >out.sites Traceback (most recent call last): File "/home/sequencing/src/HIV_integration/li_hiv/scripts/orientation.py", line 140, in chrom = in_handle.getrname(read.tid) File "csamtools.pyx", line 799, in csamtools.Samfile.getrname (pysam/csamtools.c:8130) ValueError: tid -1 out of range 0<=tid<85

Thanks,

Diya


From: Rory Kirchner [notifications@github.com] Sent: Friday, July 01, 2016 11:18 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi DiyaVaka,

You'll have to align the reads to the host genome plus whatever HIV genome you want to look at with bwa first. If you are going to use KJ849799 as the HIV genome, the FASTA sequence should be named KJ849799 and you want to add that to the sequences for GRCh37 into one big FASTA file to align to. That will make a BAM file.

Then you want to mark the duplicates and run the chimeric.py to get only the chimeric reads.

Then you can do python orientation.py your-bam-file KJ849799 > out.sites and it will spit out the candidate integration site for every chimeric read.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230014611, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSr4NkQPgUWV0WbZ4xvrmZ8xNbivks5qRVnbgaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230606427, or mute the thread https://github.com/notifications/unsubscribe/AAZTeg1jHpzrsRdQwdrID486OirLn-MQks5qSsplgaJpZM4JDV18.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230606785, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SQ3czcBqLHgr-p-AjekfzXqmzLWMks5qSsq6gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607156, or mute the thread https://github.com/notifications/unsubscribe/AAZTelq7nvUGw0y5R7prcLRCbXrIHA3pks5qSssfgaJpZM4JDV18.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230607272, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SSbZzSnSrvhmAbPRCJFbQSf7taZTks5qSss7gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230607883, or mute the thread https://github.com/notifications/unsubscribe/AAZTejgmxkKqxMAU2Acv2TQ7MxFk--tQks5qSsvSgaJpZM4JDV18.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230608296, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SWGvT6pcYJ2nQGnBEuIVaHRk5l0Kks5qSsw7gaJpZM4JDV18. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hbc/li_hiv/issues/2#issuecomment-230612736, or mute the thread https://github.com/notifications/unsubscribe/AAZTevQQwPMcKbaX5aGf0ok7CNTVPVZ4ks5qStC7gaJpZM4JDV18.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230627158, or mute the threadhttps://github.com/notifications/unsubscribe/AE89STLeNUkenwkgULUvb5gJ7TkVIav6ks5qSuGlgaJpZM4JDV18.

roryk commented 8 years ago

Hi Diya,

I think I need the link to look on Google Drive. Thanks!

DiyaVaka commented 8 years ago

Attached is the link

https://drive.google.com/file/d/0BywzvSTpUvWvOXUxNFJ4NHk1ZjQ/view?usp=sharing

Diya Vaka

Bioinformatics Programmer Analyst

Institute of Human Genetics

513 Parnassus Ave., HSE S966

San Franscisco, CA 94143

W 415-502-3570

dedeepya.vaka@ucsf.edu


From: Rory Kirchner [notifications@github.com] Sent: Wednesday, July 06, 2016 10:33 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi Diya,

I think I need the link to look on Google Drive. Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230846384, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SWYtuExD-b6S8upQkV3IUCHdUNkmks5qS-bjgaJpZM4JDV18.

roryk commented 8 years ago

Hi Diya,

The name for the sequence you provided isn't KJ849799 it is gi|1906382|gb|K03455.1|HIVHXB2CG. When you pass KJ849799 to the chimeric.py script is looking for reads that align to KJ849799 and one of the human chromosomes. So you'll either have to rename gi|1906382|gb|K03455.1|HIVHXB2CG to KJ849799 in your FASTA file or try passing gi|1906382|gb|K03455.1|HIVHXB2CG to the chimeric.py script. The former is preferable though, I think the | characters might break some downstream things.

DiyaVaka commented 8 years ago

Hi Rory,

I did use K03445, as it is the first subtype we wanted to use. So I tried passing K03445 to chimeric script, still it not pick anything. I will try to pass gi|1906382|gb|K03455.1|HIVHXB2CG to the chineric script and see what happens.

Thanks,

Diya Vaka

Bioinformatics Programmer Analyst

Institute of Human Genetics

513 Parnassus Ave., HSE S966

San Franscisco, CA 94143

W 415-502-3570

dedeepya.vaka@ucsf.edu


From: Rory Kirchner [notifications@github.com] Sent: Wednesday, July 06, 2016 10:54 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi Diya,

The name for the sequence you provided isn't KJ849799 it is gi|1906382|gb|K03455.1|HIVHXB2CG. When you pass KJ849799 to the chimeric.py script is looking for reads that align to KJ849799 and one of the human chromosomes. So you'll either have to rename gi|1906382|gb|K03455.1|HIVHXB2CG to KJ849799 in your FASTA file or try passing gi|1906382|gb|K03455.1|HIVHXB2CG to the chimeric.py script. The former is preferable though, I think the | characters might break some downstream things.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230852598, or mute the threadhttps://github.com/notifications/unsubscribe/AE89Sc0AhLvfTtHzrRUHsl49XdJOrXonks5qS-vugaJpZM4JDV18.

DiyaVaka commented 8 years ago

Hi Rory,

Also in the manual you say

python chimeric.py smaple.bam K03455. But that's not how it works. You need to give the K03455 first and bam file latter. This is what I used

python chimeric.py HIVHXB2CG 1010_mkdup.bam

Thanks,

Diya Vaka

Bioinformatics Programmer Analyst

Institute of Human Genetics

513 Parnassus Ave., HSE S966

San Franscisco, CA 94143

W 415-502-3570

dedeepya.vaka@ucsf.edu


From: Vaka, Dedeepya Sent: Wednesday, July 06, 2016 11:07 AM To: hbc/li_hiv; hbc/li_hiv Cc: Author Subject: RE: [hbc/li_hiv] Multiple virus genomes (#2)

Hi Rory,

I did use K03445, as it is the first subtype we wanted to use. So I tried passing K03445 to chimeric script, still it not pick anything. I will try to pass gi|1906382|gb|K03455.1|HIVHXB2CG to the chineric script and see what happens.

Thanks,

Diya Vaka

Bioinformatics Programmer Analyst

Institute of Human Genetics

513 Parnassus Ave., HSE S966

San Franscisco, CA 94143

W 415-502-3570

dedeepya.vaka@ucsf.edu


From: Rory Kirchner [notifications@github.com] Sent: Wednesday, July 06, 2016 10:54 AM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi Diya,

The name for the sequence you provided isn't KJ849799 it is gi|1906382|gb|K03455.1|HIVHXB2CG. When you pass KJ849799 to the chimeric.py script is looking for reads that align to KJ849799 and one of the human chromosomes. So you'll either have to rename gi|1906382|gb|K03455.1|HIVHXB2CG to KJ849799 in your FASTA file or try passing gi|1906382|gb|K03455.1|HIVHXB2CG to the chimeric.py script. The former is preferable though, I think the | characters might break some downstream things.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230852598, or mute the threadhttps://github.com/notifications/unsubscribe/AE89Sc0AhLvfTtHzrRUHsl49XdJOrXonks5qS-vugaJpZM4JDV18.

roryk commented 8 years ago

Hi Diya,

Great-- thanks for catching the error in the chimeric.py documentation. I updated it to be correct.

DiyaVaka commented 8 years ago

No problem. So does it mean that our sample does not have any sites for K03455?

Thanks,

Diya Vaka

Bioinformatics Programmer Analyst

Institute of Human Genetics

513 Parnassus Ave., HSE S966

San Franscisco, CA 94143

W 415-502-3570

dedeepya.vaka@ucsf.edu


From: Rory Kirchner [notifications@github.com] Sent: Wednesday, July 06, 2016 12:24 PM To: hbc/li_hiv Cc: Vaka, Dedeepya; Author Subject: Re: [hbc/li_hiv] Multiple virus genomes (#2)

Hi Diya,

Great-- thanks for catching the error in the chimeric.py documentation. I updated it to be correct.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/hbc/li_hiv/issues/2#issuecomment-230879728, or mute the threadhttps://github.com/notifications/unsubscribe/AE89SVuz8UfX3e3yMu7us3dTarl3Bdzeks5qTAD5gaJpZM4JDV18.

roryk commented 8 years ago

Hi Diya,

It isn't clear yet. Right now it is not finding chimeric reads because of the name.

Here is the header for the BAM file:

@HD     VN:1.3
@SQ     SN:1    LN:249250621
@SQ     SN:2    LN:243199373
@SQ     SN:3    LN:198022430
@SQ     SN:4    LN:191154276
@SQ     SN:5    LN:180915260
@SQ     SN:6    LN:171115067
@SQ     SN:7    LN:159138663
@SQ     SN:8    LN:146364022
@SQ     SN:9    LN:141213431
@SQ     SN:10   LN:135534747
@SQ     SN:11   LN:135006516
@SQ     SN:12   LN:133851895
@SQ     SN:13   LN:115169878
@SQ     SN:14   LN:107349540
@SQ     SN:15   LN:102531392
@SQ     SN:16   LN:90354753
@SQ     SN:17   LN:81195210
@SQ     SN:18   LN:78077248
@SQ     SN:19   LN:59128983
@SQ     SN:20   LN:63025520
@SQ     SN:21   LN:48129895
@SQ     SN:22   LN:51304566
@SQ     SN:X    LN:155270560
@SQ     SN:Y    LN:59373566
@SQ     SN:MT   LN:16569
@SQ     SN:GL000207.1   LN:4262
@SQ     SN:GL000226.1   LN:15008
@SQ     SN:GL000229.1   LN:19913
@SQ     SN:GL000231.1   LN:27386
@SQ     SN:GL000210.1   LN:27682
@SQ     SN:GL000239.1   LN:33824
@SQ     SN:GL000235.1   LN:34474
@SQ     SN:GL000201.1   LN:36148
@SQ     SN:GL000247.1   LN:36422
@SQ     SN:GL000245.1   LN:36651
@SQ     SN:GL000197.1   LN:37175
@SQ     SN:GL000203.1   LN:37498
@SQ     SN:GL000246.1   LN:38154
@SQ     SN:GL000249.1   LN:38502
@SQ     SN:GL000196.1   LN:38914
@SQ     SN:GL000248.1   LN:39786
@SQ     SN:GL000244.1   LN:39929
@SQ     SN:GL000238.1   LN:39939
@SQ     SN:GL000202.1   LN:40103
@SQ     SN:GL000234.1   LN:40531
@SQ     SN:GL000232.1   LN:40652
@SQ     SN:GL000206.1   LN:41001
@SQ     SN:GL000240.1   LN:41933
@SQ     SN:GL000236.1   LN:41934
@SQ     SN:GL000241.1   LN:42152
@SQ     SN:GL000243.1   LN:43341
@SQ     SN:GL000242.1   LN:43523
@SQ     SN:GL000230.1   LN:43691
@SQ     SN:GL000237.1   LN:45867
@SQ     SN:GL000233.1   LN:45941
@SQ     SN:GL000204.1   LN:81310
@SQ     SN:GL000198.1   LN:90085
@SQ     SN:GL000208.1   LN:92689
@SQ     SN:GL000191.1   LN:106433
@SQ     SN:GL000227.1   LN:128374
@SQ     SN:GL000228.1   LN:129120
@SQ     SN:GL000214.1   LN:137718
@SQ     SN:GL000221.1   LN:155397
@SQ     SN:GL000209.1   LN:159169
@SQ     SN:GL000218.1   LN:161147
@SQ     SN:GL000220.1   LN:161802
@SQ     SN:GL000213.1   LN:164239
@SQ     SN:GL000211.1   LN:166566
@SQ     SN:GL000199.1   LN:169874
@SQ     SN:GL000217.1   LN:172149
@SQ     SN:GL000216.1   LN:172294
@SQ     SN:GL000215.1   LN:172545
@SQ     SN:GL000205.1   LN:174588
@SQ     SN:GL000219.1   LN:179198
@SQ     SN:GL000224.1   LN:179693
@SQ     SN:GL000223.1   LN:180455
@SQ     SN:GL000195.1   LN:182896
@SQ     SN:GL000212.1   LN:186858
@SQ     SN:GL000222.1   LN:186861
@SQ     SN:GL000200.1   LN:187035
@SQ     SN:GL000193.1   LN:189789
@SQ     SN:GL000194.1   LN:191469
@SQ     SN:GL000225.1   LN:211173
@SQ     SN:GL000192.1   LN:547496
@SQ     SN:gi|1906382|gb|K03455.1|HIVHXB2CG     LN:9719

chimeric.py is looking for K03455 when that is passed that as an argument, but it doesn't exist in the BAM file header. The only non-human contig in that BAM file is named gi|1906382|gb|K03455.1|HIVHXB2CG not K03455. So for that BAM file, passing gi|1906382|gb|K03455.1|HIVHXB2CG instead of K03455 will look for reads chimeric with the non-human contig. The problem is gi|1906382|gb|K03455.1|HIVHXB2CG has some characters in it that might mess up some of the parsing, so I'd rename gi|1906382|gb|K03455.1|HIVHXB2CG to K03455 and rerun the alignment and then pass in K03455.