Seeking Clarification on Bowtie Usage and Unmapped BAM

mathavanpu commented 1 year ago

Hi,

I am currently engaged in a research project focused on the methylation genome of Crocus sativus. Unfortunately, a reference genome is not yet available for this species. Consequently, we have decided to utilize the refFreeDMA pipeline as suggested in your GitHub tutorial.

In the refFreeDMA paper, you mentioned the use of the Bowtie aligner along with a reference genome. I would appreciate clarification on the specific reference genome you referred to in this context.

Furthermore, I noticed that refFreeDMA requires unmapped BAM files as input. Could you kindly provide guidance on how to generate these unmapped BAM files?

Your insights and comments on these queries would be greatly appreciated.

Thank you for your assistance. Reference paper : https://linkinghub.elsevier.com/retrieve/pii/S2211-1247(15)01324-8

jklughammer commented 1 year ago

Hi, We used hg19, bosTau6, and GCA_000951615.1. as stated in the methods section to compare our reference free approach to reference based analysis.

You can use FastqToSam for that (https://broadinstitute.github.io/picard/command-line-overview.html#FastqToSam) but in the first step bam files are anyways converted to fastq files, so you can as well start with the fastq files and adjust the code accordingly. Some sequencing facilities (and in particular the one we used) provide reads as unmapped bam (instead of fastq), which is why we set up RefFreeDMA like this.

Best, Johanna

mathavanpu commented 1 year ago

Thank you for the guidance

mathavanpu commented 1 year ago

Hi

I converted the fastq file into BAM using the following script,

################################################################################ log_file="picard_conversion_log.txt"

for i in ls *_R1.fastq.gz | cut -f1 -d "_" do echo "Running Picard for sample $i" picard FastqToSam F1=$i"_R1.fastq.gz" F2=$i"_R2.fastq.gz" O=$i".bam" SM=$i >> "$log_file" 2>&1 done

################################################################################ BAM reads counts File: EB1.bam, Read Count: 33694158 File: EB2.bam, Read Count: 33104400 File: EB3.bam, Read Count: 34956606 File: ESF1.bam, Read Count: 33989762 File: ESF2.bam, Read Count: 34640184 File: ESF3.bam, Read Count: 31372584

################################################################################

but I got the output like this, prepareReads_EB1.log

=== Summary ===

Total reads processed: 1 Reads with adapters: 1 (100.0%) Reads written (passing filters): 1 (100.0%)

Total basepairs processed: 151 bp Quality-trimmed: 0 bp (0.0%) Total written (filtered): 148 bp (98.0%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 1 times.

No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters: A: 0.0% C: 0.0% G: 100.0% T: 0.0% none/other: 0.0%

Overview of removed sequences length count expect max.err error counts 3 1 0.0 0 1

============================================= 1 sequences processed in total Sequences removed because they became shorter than the length cutoff of 16 bp: 0 (0.0%) plz guide me

jklughammer commented 1 year ago

Something seems wrong with your .bam files where the trimming tool doesn't recognize your reads. Have you looked at the bam files and compared their structure to the examples? Also it seems you are trying to process paired-end data (since you have read1 and read2) - RefFreeDMA is built to handle single-end data, so maybe that also contributes to the problem.

jklughammer / RefFreeDMA

Seeking Clarification on Bowtie Usage and Unmapped BAM #7