sr320 / LabDocs

Roberts Lab Documents
http://sr320.github.io/LabDocs/
9 stars 17 forks source link

Switch to Bismark - can't extract data from alignment files #357

Closed hputnam closed 7 years ago

hputnam commented 7 years ago

I have been testing the switch to Bismark on the 18 Oly MBD samples. In good news, in the alignment sam files there is data for scaffold numbers >10,000. I am failing, however, to get the data out of Bismark and into methylkit despite trying from both directions.

Issue 1 -Bismark - bismark_methyltion_extractor function fails to output bedgraph or coverage information Issue 2 - methylkit - processBismarkAln function will not load sorted sam or bam files directly into methylkit and crashes R

any thoughts @sr320 ?

See bottom of HP notebook post for more detail

sr320 commented 7 years ago

By first suggestion is try different hardware...

the latter particularly sounds like that might be the case..

related and for comparison here is some bismark output from one of said files....

https://genomevolution.org/coge/ExperimentView.pl?eid=9227

https://genomevolution.org/coge/ExperimentView.pl?eid=9227

Can you provide a url to the SAM file you created with bismark?

hputnam commented 7 years ago

http://owl.fish.washington.edu/symbiodinium/Oly_MBD/Bismark/

I have tried roadrunner and my computer and loading all files or just one file and still crashes. I have tried direct output of bismark bam files, and sorted bam and sam files. All end up crashing R.

sr320 commented 7 years ago

I can get a bedgraph from the alignment, if that helps

https://sr320.github.io/SAM-lacked-header/

RELATED I think a lot of the issues with the alignment files could be that some info is missing in header. In the above example, it will not work if I do not provide the reference fasta.

hputnam commented 7 years ago

@seanb80 here are the files to try in Bismark. http://owl.fish.washington.edu/nightingales/O_lurida/20160203_mbdseq/

Try some of the concatenated ones zr1394_1.fastq.gz zr1394_2.fastq.gz zr1394_3.fastq.gz zr1394_4.fastq.gz zr1394_5.fastq.gz zr1394_6.fastq.gz zr1394_7.fastq.gz etc...

The file names correspond to the sample names here https://github.com/hputnam/Oly_Oyster_DNA_Methylation HC is "treatment" 0 and SS is "treatment" 1

I used bowtie2 2.2.9 bismark 0.16.3 samtools 0.1.19 R 3.2.5

seanb80 commented 7 years ago

Found the 10k file in the repeat masker issue. Running the genome prep now!

hputnam commented 7 years ago

http://owl.fish.washington.edu/halfshell/working-directory/16-10-24/Ostrea_lurida-Scaff-10k.fa

seanb80 commented 7 years ago

./bismark_genome_preparation ~/Documents/BismarkData

./bismark --genome ~/Documents/BismarkData ~/Documents/BismarkData/zr1394_1.fastq.gz --output_dir ~/Documents/BismarkData/BismarkOutput

./bismark_methylation_extractor -s --scaffolds --merge_non_CpG --bedGraph --zero_based ~/Documents/BismarkData/BismarkOutput/zr1394_1_bismark_bt2.bam --output ~/Documents/BismarkData/BismarkOutput/

worked to completion on Emu! I'll upload the results to the first run to Owl and update with a link shortly.

seanb80 commented 7 years ago

http://owl.fish.washington.edu/scaphapoda/Sean/BismarkOutput.tar.gz

Would you like me to start the other combined files?

seanb80 commented 7 years ago

covfile

hputnam commented 7 years ago

What were you thinking for the sort work around? Do you think it is best to just run it on Emu? If so I will have to load all the cleaned files.

seanb80 commented 7 years ago

I think it would be easier to run it on Emu, otherwise, someone posted a workaround where you installed GNUsort on OS X, and then symlinked it to look like Unix sort, which seems like a pain/non guaranteed workaround as I don't know about the particular differences about GNUsort vs unix sort.

seanb80 commented 7 years ago

I wrote a script that should iterate through all of the files and run the different bismark programs on the different files. Is the cleaning particularly intensive? If not, I could just throw that at the head of the script and reclean the files off of Owl.

hputnam commented 7 years ago

For now can you concatenate s1-s6 for each of the 18 samples and run those concatenated files all the way through the methylation extractor step? Thanks! http://owl.fish.washington.edu/nightingales/O_lurida/20160203_mbdseq/

hputnam commented 7 years ago

@seanb80 the Bismark methylation extractor step failed. Can you re-run it?

hputnam commented 7 years ago

Have strange histograms with no peak at 0% methylation https://github.com/hputnam/Oly_Oyster_DNA_Methylation/blob/master/Notebooks/3_Clustering_Differential_Methylation_R.ipynb in comparison to expectation of peaks around 0% and 100% as shown in manual https://bioconductor.org/packages/devel/bioc/vignettes/methylKit/inst/doc/methylKit.html

sr320 commented 7 years ago

My guess that in post alignment step there is a setting to keep, ignore 0 methylation, (ie in methratio.py -z)

On Fri, Dec 2, 2016 at 5:40 PM Hollie Putnam notifications@github.com wrote:

Have strange histograms with no peak at 0% methylation

https://github.com/hputnam/Oly_Oyster_DNA_Methylation/blob/master/Notebooks/3_Clustering_Differential_Methylation_R.ipynb in comparison to expectation of peaks around 0% and 100% as shown in manual

https://bioconductor.org/packages/devel/bioc/vignettes/methylKit/inst/doc/methylKit.html

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sr320/LabDocs/issues/357#issuecomment-264605910, or mute the thread https://github.com/notifications/unsubscribe-auth/AEPHt7eFWzvEv1t1rXi9asiFpGExjSnTks5rEMhtgaJpZM4K81Y- .