Closed ZoeyYang912 closed 9 months ago
Thank you for your interest in bakR!
The error is a bit weird as it looks like the sequencing track creation mostly worked, but a single track is missing. You could try doing what the error message suggests, which is adding --latency-wait 120
to your call to Snakemake, as it might just be a quirk of the server you are using. I'll note that as long as you run snakemake with snakemake ... --rerun-triggers mtime
(where ... represents whatever other arguments you pass to snakemake, like --use-conda
and the like), then at this point you will not have to rerun any steps of the pipeline that worked. Snakemake will pick up right where it left off.
To help you diagnose the problem in case that suggestion doesn't work, can you send me the log file for the failed step? That should be located in the logs/maketdf directory created by bam2bakR. In addition, just to be safe, if you could send me a log file from each of the rules, just so I can make sure no upstream step went wrong.
The runtime problem is due to two things:
1) You just have a lot of samples. This problem can theoretically be dealt with if you have a scheduler of some sort (e.g., slurm) set up on your server. If so, then we can discuss further how to make it so that the pipeline will batch separate jobs that will run in parallel for rules that can be run in parallel, or multiple runs of one rule on different samples.
2) bam2bakR currently uses HTSeq to assign sequencing reads to annotated features. Depending on what annotation you are using and how many reads you have, this can take between 1-3 hours to run. I have been working on a rewrite of bam2bakR called fastq2EZbakR (which despite its name can still take bam files as input) that replaces HTSeq with featureCounts, which usually takes between 1-3 minutes to run. Thus, depending on how many cores you are providing the pipeline, and thus how many instances of the HTSeq rule can be run in parallel, this change could save you many hours. Therefore, you can try running fastq2EZbakR (which has the same set up and run instructions as bam2bakR), but I will also try and get featureCounts implemented in bakR ASAP.
Best, Isaac
Hi Isaac,
Thank you for the quick response. It only generated one log file in the logs/maketdf folder which is attached here addox_0h_1 (1).log
I am running bam2bakR with additional arguments now and I get this warning "File path //mnt/Disk2/zoey/slam2/trimmed/VM33/GRCm39.primary_assembly.genome.fa.fai contains double '/'. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake."
Well I usually add a / to represent the root but if I am not sure maybe it's incompatible with Snakemake?
Thanks!
Thank you for sending the log file. There is an error in the log file noting that at first igvtools was not able to create/locate the IGV home directory. Eventually it found it though, so even though the error doesn't appear to be a file latency thing, rerunning it like you are currently trying might be the simple solution.
The warning shouldn't be a problem, though I think the double / is unnecessary, as most UNIX or UNIX-like systems should interpret "/mnt/..." and "//mnt/..." equivalently as an absolute path starting from root.
Thank you for looking into the issue.
I rerun the command with --latency-wait 120 and it was able to generate a cB.csv.gz file, and I gunzip it and feed it into bakR. Since I am doing a chase experiment to compare RNA degradation, I labelled the cells for 24 hours and then remove s4U and chased 0, 1, 5, 12 hours. I wonder how I should set the tl in the metadf, in the bakR guide tl should the the length of labeling feeding, while in my case all cells feed 24 hours? Thank you!
tl: The length of the metabolic labeling feed. This can be in any units (the simulated data puts it in terms of minutes), but if no s4U was fed to a sample, tl must be 0 for that sample. While not technically necessary to run bakR, we always highly suggest including -s4U controls.
The label time in that case is the chase time, though you will want to exclude the 0 hr chase time from bakR as it will try and use it to estimate the background mutation rate if included. In the bakRFit()
function, you should also set the Chase
parameter to TRUE
, which will account for the fact that the fraction s4U labeled is the fraction of pre-existing RNA still remaining, rather than the fraction of RNA that is new, as it would be in a pulse-label design.
Technically, bakR's support for pulse-chase designs is a bit lacking. What bakR will do if you run it the way I have suggested, is assume that all transcripts are completely labeled after the 24 hour feed. In theory, a better approach would be to account for the extent of labeling after the pulse for each gene/transcript. The reason bakR's handling of pulse-chase experimental design is not better is because I am a huge proponent of never doing a pulse-chase experiment, for reasons briefly discussed at the end of this vignette. That being said, if you find the 100% labeled assumption inappropriate for your data, you can also check out these scripts I wrote for another bakR user to try and do a better job handling pulse-chase analyses.
Thank you I will try it out!
Hi I am trying to use bakR package to analyze SLAM-seq data. When creating cB object, I am using the bam2bakR using the bam files SLAMDUNK generated. I have 2 conditions, 4 chase time points, 1 no s4U control, and all in triplicates, so 30 bam files in total. However I get this error after running bam2bakR overnight, any suggestions on how I can trouble shoot? Thanks!
Here is my config.yaml in txt format config.txt
I guess I am not sure if this is the best way to generate cB object, or is there faster way since bam2bakR probably needs to run 2 days on our server. Thank you!