Closed lennijusten closed 9 months ago
@jeffkaufman I updated the RiboDetector feature from a ribocounts()
to riboreads()
paradigm. The new function saves read titles for rRNA reads to AWS in a directory called riboreads/
. I'm starting to re-run the bioprojects now.
All AWS directories titled ribocounts/
or ribopass-reads/
can be deleted.
I also updated the prepare_dashboard
files to the new paradigm as well, which means the dashboard/ribocounts/
dir can be deleted. Running prepare_dashboard.sh
will pull files into a new dir called dashboard/riboreads/
.
RiboDetector takes a long time to run and we're currently only saving the number of rRNA reads. If we want to know the rRNA status of a read, we'd have to run it through RiboDetector again. I expect this information to be useful in the future, and it's of very little cost to save the read IDs.
Here, I add a feature that saves a text file of non-rRNA read IDs in a sample, with the IDs parsed by
FastqGeneralIterator
. The text file is saved to an AWS directory within each bioproject calledribopass-reads/
.Question: Is it worth compressing the text files before copying them to AWS?