Closed GoogleCodeExporter closed 8 years ago
Sorry, i think i didn't save the attachment before sending. Use this new
attachment
Original comment by benb...@gmail.com
on 20 Nov 2012 at 6:53
Attachments:
when is the meeting. merging large bis datasets and calculating the metrics can
take days of cpu time.
Original comment by zack...@gmail.com
on 20 Nov 2012 at 7:14
I will have to compile the data over this weekend (I leave monday for meeting).
Once the merged BAMs are done, i have my own scripts that I can run to output
coverage levels. I assume the merged BAMs can be output pretty quickly, and
the QC metrics and Bis-SNP will lag.
Original comment by benb...@gmail.com
on 20 Nov 2012 at 7:35
ok, that should be doable.
Original comment by zack...@gmail.com
on 20 Nov 2012 at 8:05
currently running on the cluster
Original comment by zack...@gmail.com
on 20 Nov 2012 at 10:33
some of the smaller bams are done.
I've noticed that our bam-merging bis workflow is now overwriting the readgroups with a single new read-group. this is bad since we lose track of the original lanes. I'm writing a fix for it now that will preserve the old readgroups...
once complete, est~ a few hours, i will cancel all runs and restart.
Original comment by zack...@gmail.com
on 21 Nov 2012 at 5:52
It looks like 4 of the 13 crashed during Bis-SNP, or at least didn't complete?
I can't see the error logs, because I don't have permissions. Here's an example
[bberman@hpc-uec:~/production-gs1/ga/analysis/Bisulfite_merge_2012-11-20] $ ls
/export/uec-gs1/laird/shared/production/ga/analysis/Bisulfite_merge_2012-11-20/*
A18*bissnp*
-rw------- 1 ramjan hsc-ar 14K Nov 22 04:46
/export/uec-gs1/laird/shared/production/ga/analysis/Bisulfite_merge_2012-11-20/u
ec_MERGING_MERGING_1_NIC1254A18_uscec_bissnp445043964234484988.sh.e2969495
-rw------- 1 ramjan hsc-ar 28K Nov 22 04:46
/export/uec-gs1/laird/shared/production/ga/analysis/Bisulfite_merge_2012-11-20/u
ec_MERGING_MERGING_1_NIC1254A18_uscec_bissnp445043964234484988.sh.o2969495
Original comment by benb...@gmail.com
on 24 Nov 2012 at 6:18
all files now group readable
Original comment by zack...@gmail.com
on 24 Nov 2012 at 7:53
Here is the error:
##### ERROR MESSAGE: SAM/BAM file
SAMFileReader{/export/uec-gs1/laird/shared/production/ga/analysis/Bisulfite_merg
e_2012-11-20/results/2969703.hpc-pbs.usc.edu/ResultCount_MERGING_1_NIC1254A16.hg
19_rCRSchrm.fa.bam} is malformed: Read HWI-ST550_0142
:6:1301:7555:110477#0 is either missing the read group or its read group is not
defined in the BAM header, both of which are required by the GATK. Please use
http://www.broadinstitute.org/gsa/wiki/index.php/ReplaceReadGroups to fix this
problem
I looked at the BAM and it looks like indeed there are reads with read groups
and reads without. I assume maybe this is because some of the older input
files didn't have read groups?
[bberman@hpc-uec:~/production-gs1/ga/analysis/Bisulfite_merge_2012-11-20] $
samtools view
results/MERGING/MERGING_1_NIC1254A16/ResultCount_MERGING_1_NIC1254A16.hg19_rCRSc
hrm.fa.bam | grep 'HWI-ST550_0142:6:1301:7555:110477#0'
HWI-ST550_0142:6:1301:7555:110477#0 163 chr1 98333 255 50M
= 98580 297 CTCACTCACTTTTCTCCTTCTACTATTACTGCTCATTCATTCCAATTTTT
CCCFFFFFHHHHHJJJJJJJJJJJJJJJIJJIJJIJIJIIJJJJJJJJJJ NM:i:0 ZS:Z:--
HWI-ST550_0142:6:1301:7555:110477#0 83 chr1 98580 255 50M
= 98333 -297 ATATTCACTTCAACTCTACTAACATTTAATAAATATTATTAACTAACTAA
IIJJGHHIHHIHHIHGIHJGIHJJJJJJJHJJJJJJJHHHHHFFFDFCCC NM:i:0 ZS:Z:-+
Original comment by benb...@gmail.com
on 24 Nov 2012 at 8:02
since readgroups are missing for certain old lanes and we cant add the
readgroups to the merged bam since we dont know which read belongs to which,
the only options are:
- rerun those old lanes through the latest pipeline and remerge
- add readgroups manually and rerun merging
- update my merging pipeline to try and detected mixed cases like this (it will
currently detect either/or) and remerge
- squish the merged bams into one readgroup and run bissnp.
- does bissnp have a "-ignore-readgroups" flag, if so, just rerun that step
all these fixes except the last require a decent chunk of time to implement/test
I dont know whats necessary for the meeting, but I'm headed out and wont be at
a terminal until tomorrow at the earliest.
Original comment by zack...@gmail.com
on 24 Nov 2012 at 8:16
issue 363, which I've fixed and is being tested on this dataset will resolve
the problems mentioned above.
Original comment by zack...@gmail.com
on 28 Nov 2012 at 10:05
BisSNP now do not have "-ignore-readgroups" flag, only the old version based on
GATK1.0 framework could do this job..
Original comment by lyping1...@gmail.com
on 28 Nov 2012 at 10:28
@12
i guess it doesn't matter now since I've redone the merging code to insert a RG
when a non-RG is merged with a with-RG.
if they are all non-RG then we stick a single RG on the merged result, such as
when splitting a fastq into pieces in the pipeline.
Original comment by zack...@gmail.com
on 28 Nov 2012 at 10:39
fixing issue 363 will result in the completion of this task.
this dataset is the testcase for #363
Original comment by zack...@gmail.com
on 29 Nov 2012 at 11:26
We need to re-run , because some directories still don't have BISSNP output:
/Volumes/storage/hpcc/uec-gs1/laird/shared/production/ga/analysis/Bisulfite_merg
e_2012-11-27/results/MERGING/MERGING_1_NIC1254A15
Original comment by benb...@gmail.com
on 17 Jan 2013 at 11:12
Original comment by zack...@gmail.com
on 21 Feb 2013 at 8:00
Original issue reported on code.google.com by
benb...@gmail.com
on 20 Nov 2012 at 6:40