nf-core / tfactivity

Bioinformatics pipeline that makes use of expression and open chromatin data to identify differentially active transcription factors across conditions.
https://nf-co.re/tfactivity
MIT License
9 stars 1 forks source link

Wrong sorting of ROSE chrom_sizes and bed #19

Open LeonHafner opened 3 days ago

LeonHafner commented 3 days ago

Description of the bug

In the ROSE workflow the bed file is sorted using SORT_BED (which uses gnu_sort). This results in the chromosomes of the bed file being sorted in this order:

chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrM
chrX
chrY

The chrom_sizes gtf (we use the genome index for that), however, looks like that:

chr1    195471971       8
chr2    182113224       198729854
chr3    160039680       383878307
chr4    156508116       546585323
chr5    151834684       705701916
chr6    149736546       860067187
chr7    145441459       1012299351
chr8    129401213       1160164843
chr9    124595110       1291722751
chr10   130694993       1418394457
chr11   122082543       1551267710
chr12   120129022       1675384973
chr13   120421639       1797516156
chr14   124902244       1919944833
chr15   104043685       2046928792
chr16   98207768        2152706549
chr17   94987271        2252551124
chr18   90702639        2349121527
chr19   61431566        2441335887
chrX    171031299       2503791321
chrY    91744698        2677673150
chrM    16299   2770946936

This leads to an error thrown in the process INVERT_TSS, since sorted files are expected here. We didn't catch this before since we are only testing on chr1.

One option would be to add the -V flag to the SORT_BED process, for sorting the bed file. This would bring the normal chromosomes in the right order, but we would still get an error with chromosome X, Y and M, since they are not in the natural sorting order in the chrom_sizes file. Therefore, I would propose sorting the chrom_sizes file with GNU_SORT as well to make sure we always get the right order.

Command used and terminal output

No response

Relevant files

No response

System information

No response

nictru commented 3 days ago

Sounds reasonable