Cached GTF or temp files?

gauravvaidya16 commented 7 months ago

Hi Job,

I'm currently using sci-rocket for our non-model planarian species, utilizing a GTF file with 81k genes categorized into various types, including protein-coding, lncRNA, and rRNA. Among these, we have approximately 24k protein-coding genes.

Initially, I ran sci-rocket on the unedited GTF file, resulting in a CDS object with 81k features. To address this, I filtered the GTF file to include only the protein-coding genes. However, despite this adjustment, the resulting CDS object consistently contains 81k features instead of the expected 24k.

I'm wondering whether the pipeline might be caching the GTF file or utilizing temporary files, potentially causing it to revert to the original unfiltered version. What do you think?

Thanks, Gaurav

J0bbie commented 7 months ago

Hi Gaurav,

How does your config.yaml look like? The GTF is used during the STAR-index generation and those annotations are used during the alignment/counting. Unless a previous star_index path is given as in that case it will always symlink and use the pre-generated index.

It should be re-generating the STAR-index based on the supplied genome / GTF if you replace it. Make sure the GTF is differently named so the STAR-index command is different compared to the previous run (as that's how Snakemake determines whether to re-run).

Alternatively, if it keeps using the old generated index even when changing the GTF, you can just delete resources/index_star/{species}/ and it will re-generate it based on the genomes/GTF supplied in the config.yaml.

Let me know if this helps!

Best,

Job

gauravvaidya16 commented 7 months ago

Thanks a lot, I will try it out!

odomlab2 / sci-rocket

Cached GTF or temp files? #31