jibsch / Socrates

Socrates: Identification of genomic rearrangements in tumour genomes by re-aligning soft clipped reads
6 stars 6 forks source link

Error: Add anchor information into re-alignment BAM file #15

Open samarth51 opened 9 years ago

samarth51 commented 9 years ago

Hi I have BWA bam files. I tried to run ./Socrates realignment step but got an error message: Add anchor information into re-alignment BAM file

Command given my me is: ./Socrates inputbam(generated in first step of socrates) outputbam

Any help will be help full. Thanks

jibsch commented 9 years ago

Try running the program from start to finish by calling Socrates all and filling in the necessary parameters. Please let me know if that gets you there.

samarth51 commented 9 years ago

Hi, I tried running with Socrates all as well using command: ./Socrates all InputBAM This command gives me a message on screen:

"Bowtie2 DB is required to perform soft-clip realignment. Please specify this parameter with --bowtie2_db "

In help section this parameter is defined as: bowtie2_db BOWTIE2_DB -- Prefix of Bowtie2 indexed database for sample (default: None) What "None" is for here?? It creates confusion.

Another thing is I have BWA generated BAM files so in that case what will be the value of "bowtie2_db" parameter?? Do i need bowtie2 generated BAM only to use Socrates??

Need suggestions on this.

jibsch commented 9 years ago

The "None" default is supposed to indicate that this parameter is required as depends on the data. It is fine to use BWA alignments, but the re-alignment is to be done with bowtie2 (at least for the time being). Create a bowtie2 index with the bowtie2-build command. Provide the prefix of the resulting set of files (not including ".") as the bowtie2_db parameter.

On 6 July 2015 at 18:33, Samarth Kulshrestha notifications@github.com wrote:

Hi, I tried running with Socrates all as well using command: ./Socrates all InputBAM This command gives me a message on screen:

"Bowtie2 DB is required to perform soft-clip realignment. Please specify this parameter with --bowtie2_db "

In help section this parameter is defined as: bowtie2_db BOWTIE2_DB -- Prefix of Bowtie2 indexed database for sample (default: None) What "None" is for here?? It creates confusion.

Another thing is I have BWA generated BAM files so in that case what will be the value of "bowtie2_db" parameter?? Do i need bowtie2 generated BAM only to use Socrates??

Need suggestions on this.

— Reply to this email directly or view it on GitHub https://github.com/jibsch/Socrates/issues/15#issuecomment-118774972.

samarth51 commented 9 years ago

Hi, Thanks for valuable suggestions. Everything is going good so far. I have one another query. I have paired samples (Tumor vs Normal) so what parameter i need to adjust for paired samples?

Thanks

jibsch commented 9 years ago

Glad that worked. For tumour vs normal: Run both samples independently first. Then use the Socrates annotate module and specify the normal results file (paired) after "--normal".

On 8 July 2015 at 16:08, Samarth Kulshrestha notifications@github.com wrote:

Hi, Thanks for valuable suggestions. Everything is going good so far. I have one another query. I have paired samples (Tumor vs Normal) so what parameter i need to adjust for paired samples?

Thanks

— Reply to this email directly or view it on GitHub https://github.com/jibsch/Socrates/issues/15#issuecomment-119449147.

samarth51 commented 9 years ago

Hi, I ran tumor and normal samples separately, i got raw output for both the samples and all went well so far. Now to find out somatic SVs what parameters needs to be used ?

./Socrates annotate --features raw_tumor -- normal raw_blood Is this the right way ??

Thanks

jibsch commented 9 years ago

Almost: ./Socrates annotate --normal raw_blood raw_tumor

On 14 July 2015 at 17:10, Samarth Kulshrestha notifications@github.com wrote:

Hi, I ran tumor and normal samples separately, i got raw output for both the samples and all went well so far. Now to find out somatic SVs what parameters needs to be used ?

./Socrates annotate --features raw_tumor -- normal raw_blood Is this the right way ??

Thanks

— Reply to this email directly or view it on GitHub https://github.com/jibsch/Socrates/issues/15#issuecomment-121149596.

samarth51 commented 9 years ago

Hi. I performed annotation and get SV breakpoints. But i have few doubts regarding output.

1) Accoring to SV typing criterion (assuming C1 realign pos < C1 anchor pos) DELETION would be if : C1_realign_dir + & C1_anchor_dir -
I applied the mentioned criterion and get few breakpoints on different chromosome in case of Deletion.

chr1:9849740 + CCTTTAAGCTCTATTGGACTTGATATGGTTAGTTTTAAAAAGA chr2:130489048 - TAAAGAACAATAAAGGCCAGGCACTGTGGCTCATACCTGTAATCCCAGCACTTTGGG 1 43 0 0 0 39.0 chr2:130489052 - GAACAATAAAGGCCAGGCACTGTGGCTCA chr1:9849744 + TTTACCTTTAAGCTCTATTGGACTTGATATGGTTAGTTTTAAAAAGAGTTGTTAGCTTTTAGAGATGTATG 1 29 00 0 38.0 Micro-homology: 4bp homology found! (TAAA)

This read has C1_realign chr1:9849740 and C1_anchor chr2:130489048 . Why this happened for a DELETION event??

2) I also have SV breakpoints in output where C1 realign pos > C1 anchor pos. So how to deal with this kind of breakpoints?

Please give some suggestions for the problem.

jibsch commented 9 years ago

Hi, Socrates calls fusions between two coordinates. The orientation of the breakpoints (+/-) determines what kind of event it is. In case 1) it is indeed a deletion signature: coordinate 1 < coordinate 2, and orientations +, -. Case 2) occurs, if realignment was successful on one side of the fusion only. You can treat them the same way: if the smaller coordinate has a +, the second a -, it's the deletion signature, -+ for tandem duplication, ++, -- for inversions types. More complex events contain more than one fusion and are not as easily identifiable. Cheers, Jan

On 21 July 2015 at 17:59, Samarth Kulshrestha notifications@github.com wrote:

Hi. I performed annotation and get SV breakpoints. But i have few doubts regarding output.

1) Accoring to SV typing criterion (assuming C1 realign pos < C1 anchor pos) DELETION would be if : C1_realign_dir + & C1_anchor_dir -

I applied the mentioned criterion and get few breakpoints on different chromosome in case of Deletion.

chr1:9849740 + CCTTTAAGCTCTATTGGACTTGATATGGTTAGTTTTAAAAAGA chr2:130489048

  • TAAAGAACAATAAAGGCCAGGCACTGTGGCTCATACCTGTAATCCCAGCACTTTGGG 1 43 0 0 0 39.0 chr2:130489052 - GAACAATAAAGGCCAGGCACTGTGGCTCA chr1:9849744 + TTTACCTTTAAGCTCTATTGGACTTGATATGGTTAGTTTTAAAAAGAGTTGTTAGCTTTTAGAGATGTATG 1 29 00 0 38.0 Micro-homology: 4bp homology found! (TAAA)

This read has C1_realign chr1:9849740 and C1_anchor chr2:130489048 . Why this happened for a DELETION event??

2) I also have SV breakpoints in output where C1 realign pos > C1 anchor pos. So how to deal with this kind of breakpoints?

Please give some suggestions for the problem.

— Reply to this email directly or view it on GitHub https://github.com/jibsch/Socrates/issues/15#issuecomment-123205523.

samarth51 commented 9 years ago

Hi I used --repeatmask (UCSC repeat masker track) for annotation purpose. I get output column with name "repeat1" "repeat2" but do not get any value for these columns. So what are the expected values or output for those columns if i use repeat masker track ?? UCSC repeat masker track format chr1 16777160 16777470 AluSp 2147 + chr1 25165800 25166089 AluY 2626 -

jibsch commented 9 years ago

That's interesting. Can you try again by using --features instead of --repeatmask? Otherwise, are the chromosome names in the Socrates output the same as in the annotation?

On 5 August 2015 at 18:00, Samarth Kulshrestha notifications@github.com wrote:

Hi I used --repeatmask (UCSC repeat masker track) for annotation purpose. I get output column with name "repeat1" "repeat2" but do not get any value for these columns. So what are the expected values or output for those columns if i use repeat masker track ?? UCSC repeat masker track format chr1 16777160 16777470 AluSp 2147 + chr1 25165800 25166089 AluY 2626 -

— Reply to this email directly or view it on GitHub https://github.com/jibsch/Socrates/issues/15#issuecomment-127905682.

samarth51 commented 9 years ago

I tried with --features( gene coordinates) this give me "feature1" and "feature2" with gene names but --repeatmask parameter does not output anything. Yes chromosome names in the output and annotation are same... Any suggestion for repeatmask??

jibsch commented 9 years ago

Sorry, I meant using --features and the repeat track. Does that give you repeat annotation, or does the problem persist?

On 5 August 2015 at 23:16, Samarth Kulshrestha notifications@github.com wrote:

I tried with --features( gene coordinates) this give me "feature1" and "feature2" with gene names but --repeatmask parameter does not output anything. Yes chromosome names in the output and annotation are same... Any suggestion for repeatmask??

— Reply to this email directly or view it on GitHub https://github.com/jibsch/Socrates/issues/15#issuecomment-127994317.

samarth51 commented 9 years ago

Hi, I tried all the possible combinations of parameters. I also ran a combination of --feature (UCSS_repeatmask) and gets repeat annotation (output pasted below). C1_realign C1_realign_dir C1_realign_consensus C1_anchor C1_anchor_dir C1_anchor_consensus C1_long_support C1_long_support_bases C1_short_support C1_short_support_bases C1_short_support_max_len C1_avg_realign_mapq C2_realign C2_realign_dir C2_realign_consensus C2_anchor C2_anchor_dir C2_anchor_consensus C2_long_support C2_long_support_bases C2_short_support C2_short_support_bases C2_short_support_max_len C2_avg_realign_mapq BP_condition normal feature1 feature2

chr1:821605 - GCCCTTTGGCAGAGCAGGTGTGCTGTGCTGTGCTGATCCCCGGGAGTC chr1:821634 + CAGCACAGCACACCTGCTCTGCCAAAGGGCAGCCAGACTGCTTCTTTAAGCAGTTCCTGATCTTGTTT 11 443 15 215 23 37.363636 chr1:821634 + CAGCACAGCACACCTGCTCTGCCAAAGGGCAGCCAGACTGCTTCTTTA chr1:821605 - GCCCTTTGGCAGAGCAGGTGTGCTGTGCTGTGCTGATCCCCGGGAGTCTCCAGAGCC AGCAGGCTGGA 13 519 11 100 15 29.692308 Blunt-end joining normal L1PA7 L1PA7.

When i provide repeatmasker file for --repeatmak it does not work but when i provide the same repeatmasker file for --feature parameter it works (output pasted above).