Bioinformatics pipeline that makes use of expression and open chromatin data to identify differentially active transcription factors across conditions.
In the ROSE workflow the bed file is sorted using SORT_BED (which uses gnu_sort).
This results in the chromosomes of the bed file being sorted in this order:
This leads to an error thrown in the process INVERT_TSS, since sorted files are expected here.
We didn't catch this before since we are only testing on chr1.
One option would be to add the -V flag to the SORT_BED process, for sorting the bed file.
This would bring the normal chromosomes in the right order, but we would still get an error with chromosome X, Y and M, since they are not in the natural sorting order in the chrom_sizes file.
Therefore, I would propose sorting the chrom_sizes file with GNU_SORT as well to make sure we always get the right order.
Description of the bug
In the ROSE workflow the bed file is sorted using SORT_BED (which uses gnu_sort). This results in the chromosomes of the bed file being sorted in this order:
The
chrom_sizes
gtf (we use the genome index for that), however, looks like that:This leads to an error thrown in the process
INVERT_TSS
, since sorted files are expected here. We didn't catch this before since we are only testing onchr1
.One option would be to add the
-V
flag to theSORT_BED
process, for sorting the bed file. This would bring the normal chromosomes in the right order, but we would still get an error with chromosome X, Y and M, since they are not in the natural sorting order in thechrom_sizes
file. Therefore, I would propose sorting thechrom_sizes
file withGNU_SORT
as well to make sure we always get the right order.Command used and terminal output
No response
Relevant files
No response
System information
No response