Closed bensprung closed 3 years ago
Hi @bensprung,
you are right, for fastqc we are currently only using the R1 fastq file as input. This is mostly because so far (at least in our tests), the results for R2 are kind of the same, so they do not add much information, and also because I was a bit lazy when coding this - it's a bit tricky to change the current behavior and add R2 as well. This is because the settings for fastqc also allow to use the trimmed reads instead of the raw samples as input, and in turn, the trimmed reads can also be merged into a single fastq file. Hence, there is a combination of settings where the data has R1 and R2, but we'd still only want to use one file as input - and I just avoided coding that so far ;-)
If you think it's worth having fastqc run on R2 in the case that raw sample data is used or trimming without merging is used as input for fastqc, I can add that though. Let me know, it's tricky, but doable :-)
Cheers Lucas
Okay okay, I couldn't resist and coded this now. Please download the latest grenepipe, which now supports FastQC on both paired end reads, as well as on the trimmed (and potentially merged) files as well, depending on the FastQC settings in the config file.
Please let me know if that is what you needed and works for you.
So long Lucas
Ha, thank you! It looks great. I was going to say, I didn't want to create extra work. But I do have a good (post-hoc now) case for doing this. I had somehow added trailing tabs to some of my samples.tsv files, causing only one set of reads to be processed, which I didn't notice for a while. I might have noticed it faster if the typical multiQC view had stats for both files, as it does now with this change. Very nice!
I was going to say, I didn't want to create extra work.
No worries - I want to create a tool that is actually useful, and my use case might differ from yours, so thanks for your input on what kind of tools and things are actually needed in practice :-)
I had somehow added trailing tabs to some of my samples.tsv files, causing only one set of reads to be processed, which I didn't notice for a while.
Oh, that is unexpected - could you explain what happened there in some more detail please? Do you have single or paired end end reads (one or two columns in the table)? How does a tab cause the trouble there? That should get fixed, as this seems like an easy to make mistake.
Yep I had paired end reads. I (mistakenly) formatted my samples.tsv like this:
sample\tunit\tplatform\tfq1\tfq2
sample_name\t1\tILLUMINA\t/path/to/fq1\tpath/to/fq2\t
So there's an extra tab at the end of the first non-header row. It seemed to cause a one-off error in what field was interpreted as what, but the pipeline still ran with fq2 playing the role of fq1, and no fq2. I think!
p.s. as long as I have you on the line, a couple things I have noticed about snpEff:
(1) the download-dir
parameter in config.yaml
must have a trailing slash to work, e.g. "/home/ben/my_download_dir/
" works but "/home/ben/my_download_dir
" does not. Not sure if that is the intended behavior (maybe just a warning about this in the comments would suffice?).
(2) if feasible, it might be good to support custom databases for snpEff. I am new to using it but so far as I can tell the yeast databases it supplies are kind of out of date, it has R64-1 while the latest is R64-3
Hm weird. I'm using pandas to read that table - no idea what it does with those extra tabs... Apparently not the right thing though. Might look into this at some point.
As for your PS: Good points! Would you mind opening issues for those as well?
Closing this one here for now - feel free to re-open as needed, and see you in the other issues then (no worries, I'll be on the line here for a while).
I'm not sure if this is a bug or if I'm misinterpreting something. I have paired-end reads. In the trimmomatic logs I can see we are indeed using both
fq1
andfq2
:However the fastqc log only the first file appears:
I noticed this because in MultiQC there seems to be one line for the sample as a whole, then another line for fq1, but no line for fq2: