Closed boehmv closed 1 year ago
Hi Volker,
Thank you very much for your interest in bakR!
I will address your minor comments first. In regards to the error you are running into, it is not one I have personally every seen but I'll do my best to try and help you diagnose the problem.
flattened
is an experimental parameter that should be set to FALSE for now, and that I will likely remove from the main branch shortly so that I can continue work on it separately. The parameter is to allow alignment to a DEXSeq flattened annotation, which requires a change in HTSeq's behavior downstream of read alignment. It's a bit of a gray area as I consider it a part of the larger "fastq2bakR" pipeline, though it is admittedly confusing that it affects a bam2bakR step.--rerun-triggers mtime
to your call to snakemake
(e.g., snakemake --cores all --use-conda --rerun-triggers mtime
). The details of why you have to do this are discussed here. bam2bakR requires Snakemake to temporarily copy some shell scripts, which newer versions of Snakemake view as a change to the input to steps using such shell scripts. --rerun-triggers mtime
makes it so that jobs are only rerun if the output of that job is missing or has been modified since the last run.In regards to your HTSeq error:
samtools view <one of the .s.sam files from sort_filter output> | head -n 10
and head <same .s.sam file> -n 10
? I want to make sure that nothing went wrong in the upstream step of bam2bakR.samtools view -b -s 0.01 <input bam file>.bam > <subsampled bam name>.bam
) or less. Thank you very much for the super fast response and for willing to help me troubleshoot, much appreciated!
Concerning all the minor points: makes sense, thanks for the explanation and --rerun-triggers mtime
works fine (although it still generates the [E::sam_parse1]
problems mentioned above), but it skips now the sort_filter step (yeah!)
What is the output of samtools view <one of the .s.sam files from sort_filter output> | head -n 10 and head <same .s.sam file> -n 10? I want to make sure that nothing went wrong in the upstream step of bam2bakR.
I have run both commands on one of the bam files, please find the output here: samtools_head.txt head_plain.txt
How did you align your fastq files?
I have pasted the command in this text file, I hope the most relevant parameters are understandable: Align.txt
Would you be willing to share a downsampled version of one of your bam files to help me more easily reproduce the error? Perhaps with only 1% of the reads (which can be generated by running samtools view -b -s 0.01 .bam >
.bam) or less.
I had to subsample 0.5% of the reads to pass the size limit, please find the bam file here: Subsampled.bam.gz
Once again: thanks for your support
No problem, I'm sorry that you have to deal with errors in someone else's tool, I know how annoying that can be.
Thank you for providing all of the requested files, nothing looks out of the ordinary so I will try running your downsampled bam file. Can you also send me the content of your config.yaml
file to make sure I specify everything correctly?
Sure, here you go (just renamed to txt since GitHub would not allow otherwise):
Thank you, I was able to reproduce, diagnose, and (I think) fix the error. The fix is now implemented on the main branch, though before running it you will have to add one parameter to your config file: remove_tags: True
. Below I will briefly discuss the problem and the current solution. Let me know if this admittedly hacky solution will not be appropriate for your particular use case. Also let me know if the fix doesn't work for you, or if you run into any other problems.
The problem is that HTseq is misprocessing some of the non-standard tags in your bam files, namely the jI and jM tags. I guess this is a common problem, as the STAR manual states: "jM jI attributes require samtools 0.1.18 or later, and were reported to be incompatible with some downstream tools such as Cufflinks."
It might be possible for me to edit the HTSeq python script to get around this issue, but for now I just wrote a python script to remove those tags from bam files, and added the tag removal step as an optional step that can be included by setting the new config parameter remove_tags
to True
.
Hi Isaac,
quick feedback: using the remove_tags: True
parameter and re-running the pipeline worked without any further problems! Thank you very much for fixing this issue that quickly.
I am currently looking at the output from bakR and although I probably need to read you mansucript more carefully, it looks really promising.
Along those lines: I very much appreciate your efforts to a) create a tool like that and b) for the extensive documentation. I know how much pain that can be, but except for this "tag" problem, everything else was perfect and very nicely documented.
Hi Isaac, first of all: thanks for providing your cool bakR package, I was just looking for such an approach when I saw your publication in RNA and promptly wanted to check it out.
Disclaimer: I am by far not an expert in bioinformatics or snakemake, so please forgive me if some questions/comments are "stupid".
I have a few minor comments and one major problem right now.:
Consequently, samtools sort stops and no final "results/htseq/{sample}_tl.bam" file is produced. Do you have any idea what the problem could be?
The log file for this job is attached below.
Thanks in advance and best wishes, Volker
Nter_24h_IAA_4SU_2.log